Data Fiduciary in Order to Alleviate Principal-Agent Problems in the Artificial Big Data Age


Paper by Julia M. Puaschunder: “The classic principal-agent problem in political science and economics describes agency dilemmas or problems when one person, the agent, is put in a situation to make decisions on behalf of another entity, the principal. A dilemma occurs in situations when individual profit maximization or principal and agent are pitted against each other. This so-called moral hazard is nowadays emerging in the artificial big data age, when big data reaping entities have to act on behalf of agents, who provide their data with trust in the principal’s integrity and responsible big data conduct. Yet to this day, no data fiduciary has been clearly described and established to protect the agent from misuse of data. This article introduces the agent’s predicament between utility derived from information sharing and dignity in privacy as well as hyper-hyperbolic discounting fallibilities to not clearly foresee what consequences information sharing can have over time and in groups. The principal’s predicament between secrecy and selling big data insights or using big data for manipulative purposes will be outlined. Finally, the article draws a clear distinction between manipulation and nudging in relation to the potential social class division of those who nudge and those who are nudged…(More)”.

The Urban Institute Data Catalog


Data@Urban: “We believe that data make the biggest impact when they are accessible to everyone.

Today, we are excited to announce the public launch of the Urban Institute Data Catalog, a place to discover, learn about, and download open data provided by Urban Institute researchers and data scientists. You can find data that reflect the breadth of Urban’s expertise — health, education, the workforce, nonprofits, local government finances, and so much more.

Built using open source technology, the catalog holds valuable data and metadata that Urban Institute staff have created, enhanced, cleaned, or otherwise added value to as part of our work. And it will provide, for the first time, a central, searchable resource to find many of Urban’s published open data assets.

We hope that researchers, data analysts, civic tech actors, application developers, and many others will use this tool to enhance their work, save time, and generate insights that elevate the policy debate. As Urban produces data for research, analysis, and data visualization, and as new data are released, we will continue to update the catalog.

We’re thrilled to put the power of data in your hands to better understand and respond to many critical issues facing us locally and nationally. If you have comments about the tool or the data it contains, or if you would like to share examples of how you are using these data, please feel free to contact us at [email protected].

Here are some current highlights of the Urban Data Catalog — both the data and research products we’ve built using the data — as of this writing:

– LODES data: The Longitudinal Employer-Household Dynamics Origin-Destination Employment Statistics (LODES) from the US Census Bureau provide detailed information on workers and jobs by census block. We have summarized these large, dispersed data into a set of census tract and census place datasets to make them easier to use. For more information, read our earlier Data@Urban blog post.

– Medicaid opioid data: Our Medicaid Spending and Prescriptions for the Treatment of Opioid Use Disorder and Opioid Overdose dataset is sourced from state drug utilization data and provides breakdowns by state, year, quarter, drug type, and brand name or generic drug status. For more information and to view our data visualization using the data, see the complete project page.

– Nonprofit and foundation data: Members of Urban’s National Center for Charitable Statistics (NCCS) compile, clean, and standardize data from the Internal Revenue Service (IRS) on organizations filing IRS forms 990 or 990-EZ, including private charities, foundations, and other tax-exempt organizations. To read more about these data, see our previous blog posts on redesigning our Nonprofit Sector in Brief Report in R and repurposing our open code and data to create your own custom summary tables….(More)”.

Big Data Analytics in Healthcare


Book edited by Anand J. Kulkarni, Patrick Siarry, Pramod Kumar Singh, Ajith Abraham, Mengjie Zhang, Albert Zomaya and Fazle Baki: “This book includes state-of-the-art discussions on various issues and aspects of the implementation, testing, validation, and application of big data in the context of healthcare. The concept of big data is revolutionary, both from a technological and societal well-being standpoint. This book provides a comprehensive reference guide for engineers, scientists, and students studying/involved in the development of big data tools in the areas of healthcare and medicine. It also features a multifaceted and state-of-the-art literature review on healthcare data, its modalities, complexities, and methodologies, along with mathematical formulations.

The book is divided into two main sections, the first of which discusses the challenges and opportunities associated with the implementation of big data in the healthcare sector. In turn, the second addresses the mathematical modeling of healthcare problems, as well as current and potential future big data applications and platforms…(More)”.

Risk identification and management for the research use of government administrative data


Paper by Elizabeth Shepherd, Anna Sexton, Oliver Duke-Williams, and Alexandra Eveleigh: “Government administrative data have enormous potential for public and individual benefit through improved educational and health services to citizens, medical research, environmental and climate interventions and exploitation of scarce energy resources. Administrative data is usually “collected primarily for administrative (not research) purposes by government departments and other organizations for the purposes of registration, transaction and record keeping, during the delivery of a service” such as health care, vehicle licensing, tax and social security systems (https://esrc.ukri.org/funding/guidance-for-applicants/research-ethics/useful-resources/key-terms-glossary/). Administrative data are usually distinguished from data collected for statistical use such as the census. Unlike administrative records, they do not provide evidence of activities and generally lack metadata and context relating to provenance. Administrative data, unlike open data, are not routinely made open or accessible, but access can be provided only on request to named researchers for specified research projects through research access protocols that often take months to negotiate and are subject to significant constraints around re-use such as the use of safe havens. Researchers seldom make use of freedom of information or access to information protocols to access such data because they need specific datasets and particular levels of granularity and an ability to re-process data, which are not made generally available. This study draws on research undertaken by the authors as part of the Administrative Data Research Centre in England (ADRC-E). The research examined perspectives on the sharing, linking and re-use (secondary use) of administrative data in England, viewed through three analytical themes: trust, consent and risk. This study presents the analysis of the identification and management of risk in the research use of government administrative data and presents a risk framework. Risk management (i.e. coordinated activities that allow organizations to control risks, Lemieux, 2010) enables us to think about the balance between risk and benefit for the public good and for other stakeholders. Mitigating activities or management mechanisms used to control the identified risks depend on the resources available to implement the options, on the risk appetite or tolerance of the community and on the cost and likely effectiveness of the mitigation. Mitigation and risk do not work in isolation and should be holistically viewed by keeping the whole information infrastructure in balance across the administrative data system and between multiple stakeholders.

This study seeks to establish a clearer picture of risk with regard to government administrative data in England. It identifies and categorizes the risks arising from the research use of government administrative data. It identifies mitigating risk management activities, linked to five key stakeholder communities and discusses the locus of responsibility for risk management actions. The identification of the risks and of mitigation strategies is derived from the viewpoints of the interviewees and associated documentation; therefore, they reflect their lived experience. The five stakeholder groups identified from the data are as follows: individual researchers; employers of researchers; wider research community; data creators and providers and data subjects and the broader public. The primary sections of the study, following the methodology and research context, set out the seven identified types of risk events in the research use of administrative data, present a stakeholder mapping of the communities in this research affected by the risks and discuss the findings related to managing and mitigating the risks identified. The conclusion presents the elements of a new risk framework to inform future actions by the government data community and enable researchers to exploit the power of administrative data for public good….(More)”.

Why data from companies should be a common good


Paula Forteza at apolitical: “Better planning of public transport, protecting fish from intensive fishing, and reducing the number of people killed in car accidents: for these and many other public policies, data is essential.

Data applications are diverse, and their origins are equally numerous. But data is not exclusively owned by the public sector. Data can be produced by private actors such as mobile phone operators, as part of marine traffic or by inter-connected cars to give just a few examples.

The awareness around the potential of private data is increasing, as the proliferation of data partnerships between companies, governments, local authorities show. However, these partnerships represent only a very small fraction of what could be done.

The opening of public data, meaning that public data is made freely available to everyone, has been conducted on a wide scale in the last 10 years, pioneered by the US and UK, soon followed by France and many other countries. In 2015, France took a first step, as the government introduced the Digital Republic Bill which made data open by default and introduced the concept of public interest data. Due to a broad definition and low enforcement, the opening of private sector data is, nevertheless, still lagging behind.

The main arguments for opening private data are that it will allow better public decision-making and it could trigger a new way to regulate Big Tech. There is, indeed, a strong economic case for data sharing, because data is a non-rival good: the value of data does not diminish when shared. On the contrary, new uses can be designed and data can be enriched by aggregation, which could improve innovation for start-ups….

Why Europe needs a private data act

Data hardly knows any boundaries.

Some states are opening like France did in 2015 by creating a framework for “public interest data,” but the absence of a common international legal framework for private data sharing is a major obstacle to its development. To scale up, a European Private Data Act is needed.

This framework must acknowledge the legitimate interest of the private companies that collect and control data. Data can be their main source of income or one they are wishing to develop, and this must be respected. Trade secrecy has to be protected too: data sharing is not open data.

Data can be shared to a limited and identified number of partners and it does not always have to be free. Yet private interest must be aligned with the public good. The European Convention on Human Rights and the European Charter of Fundamental Rights acknowledge that some legitimate and proportional limitations can be set to the freedom of enterprise, which gives everyone the right to pursue their own profitable business.

The “Private Data Act” should contain several fundamental data sharing principles in line with those proposed by the European Commission in 2018: proportionality, “do no harm”, full respect of the GDPR, etc. It should also include guidelines on which data to share, how to appreciate the public interest, and in which cases data should be opened for free or how the pricing should be set.

Two methods can be considered:

  • Defining high-value datasets, as it has been done for public data in the recent Open Data Directive, in areas like mobile communications, banking, transports, etc. This method is strong but is not flexible enough.
  • Alternatively, governments might define certain “public interest projects”. In doing so, governments could get access to specific data that is seen as a prerequisite to achieve the project. For example, understanding why there is a increasing mortality among bees, requires various data sources: concrete data on bee mortality from the beekeepers, crops and the use of pesticides from the farmers, weather data, etc. This method is more flexible and warrants that only the data needed for the project is shared.

Going ahead on open data and data sharing should be a priority for the upcoming European Commission and Parliament. Margrethe Vestager has been renewed as Competition Commissioner and Vice-President of the Commission and she already mentioned the opportunity to define access to data for newcomers in the digital market.

Public interest data is a new topic on the EU agenda and will probably become crucial in the near future….(More)”.

Andrew Yang proposes that your digital data be considered personal property


Michael Grothaus at Fast Company: “2020 Democratic presidential candidate Andrew Yang may not be at the top of the race when it comes to polling (Politico currently has him ranked as the 7th most-popular Democratic contender), but his policies, including support for universal basic income, have made him popular among a subset of young, liberal-leaning, tech-savvy voters. Yang’s latest proposal, too, is sure to strike a chord with them.

The presidential candidate published his latest policy proposal today: to treat data as a property right. Announcing the proposal on his website, Yang lamented how our data is collected, used, and abused by companies, often with little awareness or consent from us. “This needs to stop,” Yang says. “Data generated by each individual needs to be owned by them, with certain rights conveyed that will allow them to know how it’s used and protect it.”

The rights Yang is proposing:

  • The right to be informed as to what data will be collected, and how it will be used
  • The right to opt out of data collection or sharing
  • The right to be told if a website has data on you, and what that data is
  • The right to be forgotten; to have all data related to you deleted upon request
  • The right to be informed if ownership of your data changes hands
  • The right to be informed of any data breaches including your information in a timely manner
  • The right to download all data in a standardized format to port to another platform…(More)”.

Bottom-up data Trusts: disturbing the ‘one size fits all’ approach to data governance


Sylvie Delacroix and Neil D Lawrence at International Data Privacy Law: “From the friends we make to the foods we like, via our shopping and sleeping habits, most aspects of our quotidian lives can now be turned into machine-readable data points. For those able to turn these data points into models predicting what we will do next, this data can be a source of wealth. For those keen to replace biased, fickle human decisions, this data—sometimes misleadingly—offers the promise of automated, increased accuracy. For those intent on modifying our behaviour, this data can help build a puppeteer’s strings. As we move from one way of framing data governance challenges to another, salient answers change accordingly. Just like the wealth redistribution way of framing those challenges tends to be met with a property-based, ‘it’s our data’ answer, when one frames the problem in terms of manipulation potential, dignity-based, human rights answers rightly prevail (via fairness and transparency-based answers to contestability concerns). Positive data-sharing aspirations tend to be raised within altogether different conversations from those aimed at addressing the above concerns. Our data Trusts proposal challenges these boundaries.

This article proceeds from an analysis of the very particular type of vulnerability concomitant with our ‘leaking’ data on a daily basis, to show that data ownership is both unlikely and inadequate as an answer to the problems at stake. We also argue that the current construction of top-down regulatory constraints on contractual freedom is both necessary and insufficient. To address the particular type of vulnerability at stake, bottom-up empowerment structures are needed. The latter aim to ‘give a voice’ to data subjects whose choices when it comes to data governance are often reduced to binary, ill-informed consent. While the rights granted by instruments like the GDPR can be used as tools in a bid to shape possible data-reliant futures—such as better use of natural resources, medical care, etc, their exercise is both demanding and unlikely to be as impactful when leveraged individually. As a bottom-up governance structure that is uniquely capable of taking into account the vulnerabilities outlined in the first section, we highlight the constructive potential inherent in data Trusts. This potential crosses the traditional boundaries between individualist protection concerns on one hand and collective empowerment aspirations on the other.

The second section explains how the Trust structure allows data subjects to choose to pool the rights they have over their personal data within the legal framework of a data Trust. It is important that there be a variety of data Trusts, arising out of a mix of publicly and privately funded initiatives. Each Trust will encapsulate a particular set of aspirations, reflected in the terms of the Trust. Bound by a fiduciary obligation of undivided loyalty, data trustees will exercise the data rights held under the Trust according to its particular terms. In contrast to a recently commissioned report,1 we explain why data can indeed be held in a Trust, and why the extent to which certain kinds of data may be said to give rise to property rights is neither here nor there as far as our proposal is concerned. What matters, instead, is the extent to which regulatory instruments such as the GDPR confer rights, and for what kind of data. The breadth of those rights will determine the possible scope of data Trusts in various jurisdictions.

Our ‘Case Studies’ aim to illustrate the complementarity of our data Trusts proposal with the legal provisions pertaining to different kinds of personal data, from medical, genetic, financial, and loyalty card data to social media feeds. The final section critically considers a variety of implementation challenges, which range from Trust Law’s cross-jurisdictional aspects to uptake and exit procedures, including issues related to data of shared provenance. We conclude by highlighting the way in which an ecosystem of data Trusts addresses ethical, legal, and political needs that are complementary to those within the reach of regulatory interventions such as the GDPR….(More)”.

Restricting data’s use: A spectrum of concerns in need of flexible approaches


Dharma Akmon and Susan Jekielek at IASSIST Quaterly: “As researchers consider making their data available to others, they are concerned with the responsible use of data. As a result, they often seek to place restrictions on secondary use. The Research Connections archive at ICPSR makes available the datasets of dozens of studies related to childcare and early education. Of the 103 studies archived to date, 20 have some restrictions on access. While ICPSR’s data access systems were designed primarily to accommodate public use data (i.e. data without disclosure concerns) and potentially disclosive data, our interactions with depositors reveal a more nuanced notion range of needs for restricting use. Some data present a relatively low risk of threatening participants’ confidentiality, yet the data producers still want to monitor who is accessing the data and how they plan to use them. Other studies contain data with such a high risk of disclosure that their use must be restricted to a virtual data enclave. Still other studies rest on agreements with participants that require continuing oversight of secondary use by data producers, funders, and participants. This paper describes data producers’ range of needs to restrict data access and discusses how systems can better accommodate these needs….(More)”.

Tracking the Labor Market with “Big Data”


Tomaz Cajner, Leland Crane, Ryan Decker, Adrian Hamins-Puertolas, and Christopher Kurz at FEDSNotes: “Payroll employment growth is one of the most reliable business cycle indicators. Each postwar recession in the United States has been characterized by a year-on-year drop in payroll employment as measured by the BLS Current Employment Statistics (CES) survey, and, outside of these recessionary declines, the year-on-year payroll employment growth has always been positive. Thus, it is not surprising that policymakers, financial markets, and the general public pay a great deal of attention to the CES payroll employment gains reported at the beginning of each month.

However, while the CES survey is one of the most carefully conducted measures of labor market activity and uses an extremely large sample, it is still subject to significant sampling error and nonsampling errors. For example, when the BLS first reported that private nonfarm payroll gains were 148,000 in July 2019, the associated 90 percent confidence interval was +/- 100,000 due to sampling error alone….

One such source of alternative labor market data is the payroll-processing company ADP, which covers 20 percent of the private workforce. These are the data that underlie ADP’s monthly National Employment Report (NER), which forecasts BLS payroll employment changes by using a combination of ADP-derived data and other publicly available data. In our research, we explore the information content of the ADP microdata alone by producing an estimate of employment changes independent from the BLS payroll series as well as from other data sources.

A potential concern when using the ADP data is that only the firms which hire ADP to manage their payrolls will appear in the data, and this may introduce sample selection issues….(More)”

The Economics of Social Data: An Introduction


Paper by Dirk Bergemann and Alessandro Bonatti: “Large internet platforms collect data from individual users in almost every interaction on the internet. Whenever an individual browses a news website, searches for a medical term or for a travel recommendation, or simply checks the weather forecast on an app, that individual generates data. A central feature of the data collected from the individuals is its social aspect. Namely, the data captured from an individual user is not only informative about this specific individual, but also about users in some metric similar to the individual. Thus, the individual data is really social data. The social nature of the data generates an informational externality that we investigate in this note….(More)”.