Addressing the Challenges of Drafting Contracts for Data Collaboration


Blog post by Andrew Young, Andrew J. Zahuranec, Stephen Burley Tubman, William Hoffman, and Stefaan Verhulst at Data & Society: “To deal with complex public challenges, organizations increasingly seek to leverage data across sectors in new and innovative ways — from establishing prize-backed challenges around the use of diverse datasets to creating cross-sector federated data systems. These and other forms of data collaboratives are part of a new paradigm in data-driven innovation in which participants from different sectors provide access to data for the creation of public value. It provides an essential new problem-solving approach for our increasingly datafied society. However, the operational challenges associated with creating such partnerships often prevent the transformative potential of data collaboration from being achieved.

One such operational challenge relates to developing data sharing agreements — through contracts and other legal documentation. The current practice suffers from large inefficiencies and transaction costs resulting from (i) the lack of a common understanding of what the core issues are with data exchange; (ii) lack of common language or models; (iii) large heterogeneity in agreements used; (iv) lack of familiarity among lawyers of the technologies involved and (v) a sense that every initiative needs to (re)invent the wheel. Removing these barriers may enable collaborators to partner more systematically and responsibly around the re-use of data assets. Contracts for Data Collaboration (C4DC) is a new initiative seeking to address these barriers to data collaboration…

In the longer term, participants focused on three major themes that, if addressed, could steer contracting for data collaboration toward greater effectiveness and legitimacy.

Data Stewardship and Responsibility: First, much of the discussion centered on the need to promote responsible data practices through data stewardship. Though part of this work involves creating teams and individuals empowered to share, it also means empowering them to operationalize ethical principles.

By developing international standards and moving beyond the bare minimum legal obligation, these actors can build trust between parties, a quality that has often been difficult to foster. Such relationships are key in engaging intermediaries or building complex contractual agreements between multiple organizations. It is also essential to come to an agreement about which practices are legitimate and illegitimate.

Incorporation of the Citizen Perspective: Trust is also needed between the actors in a data collaborative and the general public. In light of many recent stories about the misuse of data, many people are suspicious, if not outright hostile, to data partnerships. Many data subjects don’t understand why organizations want their data or how the information can be valuable in advancing public good.

In data-sharing arrangements, all actors need to explain intended uses and outcomes to data subjects. Attendees spoke about the need to explain the data’s utility in clear and accessible terms. They also noted data collaborative contracts are more legitimate if they incorporate citizen perspectives, especially those of marginalized groups. To take this work a step further, the public could be brought into the contract writing process by creating mechanisms capable of soliciting their views and concerns.

Improving Internal and External Collaboration: Lastly, participants discussed the need for actors across the data ecosystem to strengthen relationships inside and outside their organizations. Part of this work entails securing internal buy-in for data collaboration, ensuring that the different components of an organization understand what assets are being shared and why.

It also entails engaging with intermediaries to fill gaps. Each actor has limitations to their capacities and expertise and, by engaging with start-ups, funders, NGOs, and others, organizations can improve the odds of a successful collaboration. Together, organizations can create norms and shared languages that allow for more effective data flows.

One such operational challenge relates to developing data sharing agreements — through contracts and other legal documentation. The current practice suffers from large inefficiencies and transaction costs resulting from (i) the lack of a common understanding of what the core issues are with data exchange; (ii) lack of common language or models; (iii) large heterogeneity in agreements used; (iv) lack of familiarity among lawyers of the technologies involved and (v) a sense that every initiative needs to (re)invent the wheel. Removing these barriers may enable collaborators to partner more systematically and responsibly around the re-use of data assets. Contracts for Data Collaboration (C4DC) is a new initiative seeking to address these barriers to data collaboration…(More)”.

Becoming a data steward


Shalini Kurapati at the LSE Impact Blog: “In the context of higher education, data stewards are the first point of reference for all data related questions. In my role as a data steward at TU Delft, I was able to advise, support and train researchers on various aspects of data management throughout the life cycle of a research project, from initial planning to post-publication. This included storing, managing and sharing research outputs such as data, images, models and code.

Data stewards also advise researchers on the ethical, policy and legal considerations during data collection, processing and dissemination. In a way, they are general practitioners for research data management and can usually solve most problems faced by academics. In cases that require specialist intervention, they also serve as a key point for referral (eg: IT, patent, legal experts).

Data stewardship is often organised centrally through the university library. (Subject) Data librarians, research data consultants and research data officers, usually perform similar roles to data stewards. However, TU Delft operates a decentralised model, where data stewards are placed within faculties as disciplinary experts with research experience. This allows data stewards to provide discipline specific support to researchers, which is particularly beneficial, as the concept of what data is itself varies across disciplines….(More)”.

Breaking Down Information Silos with Big Data: A Legal Analysis of Data Sharing


Chapter by Giovanni De Gregorio and Sofia Ranchordas in J. Cannataci, V. Falce & O. Pollicino (Eds), New Legal Challenges of Big Data (Edward Elgar, 2020, Forthcoming): “In the digital society, individuals play different roles depending on the situation they are placed in: they are consumers when they purchase a good, citizens when they vote for elections, content providers when they post information on a platform, and data subjects when their data is collected. Public authorities have thus far regulated citizens and the data collected on their different roles in silos (e.g., bankruptcy registrations, social welfare databases), resulting in inconsistent decisions, redundant paperwork, and delays in processing citizen requests. Data silos are considered to be inefficient both for companies and governments. Big data and data analytics are disrupting these silos allowing the different roles of individuals and the respective data to converge. In practice, this happens in several countries with data sharing arrangements or ad hoc data requests. However, breaking down the existing structure of information silos in the public sector remains problematic. While big data disrupts artificial silos that may not make sense in the digital society and promotes a truly efficient digitalization of data, removing information out of its original context may alter its meaning and violate the privacy of citizens. In addition, silos ensure that citizens are not assessed in one field by information generated in a totally different context. This chapter discusses how big data and data analytics are changing information silos and how digital technology is challenging citizens’ autonomy and right to privacy and data protection. This chapter also explores the need for a more integrated approach to the study of information, particularly in the public sector.

Data Fiduciary in Order to Alleviate Principal-Agent Problems in the Artificial Big Data Age


Paper by Julia M. Puaschunder: “The classic principal-agent problem in political science and economics describes agency dilemmas or problems when one person, the agent, is put in a situation to make decisions on behalf of another entity, the principal. A dilemma occurs in situations when individual profit maximization or principal and agent are pitted against each other. This so-called moral hazard is nowadays emerging in the artificial big data age, when big data reaping entities have to act on behalf of agents, who provide their data with trust in the principal’s integrity and responsible big data conduct. Yet to this day, no data fiduciary has been clearly described and established to protect the agent from misuse of data. This article introduces the agent’s predicament between utility derived from information sharing and dignity in privacy as well as hyper-hyperbolic discounting fallibilities to not clearly foresee what consequences information sharing can have over time and in groups. The principal’s predicament between secrecy and selling big data insights or using big data for manipulative purposes will be outlined. Finally, the article draws a clear distinction between manipulation and nudging in relation to the potential social class division of those who nudge and those who are nudged…(More)”.

Risk identification and management for the research use of government administrative data


Paper by Elizabeth Shepherd, Anna Sexton, Oliver Duke-Williams, and Alexandra Eveleigh: “Government administrative data have enormous potential for public and individual benefit through improved educational and health services to citizens, medical research, environmental and climate interventions and exploitation of scarce energy resources. Administrative data is usually “collected primarily for administrative (not research) purposes by government departments and other organizations for the purposes of registration, transaction and record keeping, during the delivery of a service” such as health care, vehicle licensing, tax and social security systems (https://esrc.ukri.org/funding/guidance-for-applicants/research-ethics/useful-resources/key-terms-glossary/). Administrative data are usually distinguished from data collected for statistical use such as the census. Unlike administrative records, they do not provide evidence of activities and generally lack metadata and context relating to provenance. Administrative data, unlike open data, are not routinely made open or accessible, but access can be provided only on request to named researchers for specified research projects through research access protocols that often take months to negotiate and are subject to significant constraints around re-use such as the use of safe havens. Researchers seldom make use of freedom of information or access to information protocols to access such data because they need specific datasets and particular levels of granularity and an ability to re-process data, which are not made generally available. This study draws on research undertaken by the authors as part of the Administrative Data Research Centre in England (ADRC-E). The research examined perspectives on the sharing, linking and re-use (secondary use) of administrative data in England, viewed through three analytical themes: trust, consent and risk. This study presents the analysis of the identification and management of risk in the research use of government administrative data and presents a risk framework. Risk management (i.e. coordinated activities that allow organizations to control risks, Lemieux, 2010) enables us to think about the balance between risk and benefit for the public good and for other stakeholders. Mitigating activities or management mechanisms used to control the identified risks depend on the resources available to implement the options, on the risk appetite or tolerance of the community and on the cost and likely effectiveness of the mitigation. Mitigation and risk do not work in isolation and should be holistically viewed by keeping the whole information infrastructure in balance across the administrative data system and between multiple stakeholders.

This study seeks to establish a clearer picture of risk with regard to government administrative data in England. It identifies and categorizes the risks arising from the research use of government administrative data. It identifies mitigating risk management activities, linked to five key stakeholder communities and discusses the locus of responsibility for risk management actions. The identification of the risks and of mitigation strategies is derived from the viewpoints of the interviewees and associated documentation; therefore, they reflect their lived experience. The five stakeholder groups identified from the data are as follows: individual researchers; employers of researchers; wider research community; data creators and providers and data subjects and the broader public. The primary sections of the study, following the methodology and research context, set out the seven identified types of risk events in the research use of administrative data, present a stakeholder mapping of the communities in this research affected by the risks and discuss the findings related to managing and mitigating the risks identified. The conclusion presents the elements of a new risk framework to inform future actions by the government data community and enable researchers to exploit the power of administrative data for public good….(More)”.

Why data from companies should be a common good


Paula Forteza at apolitical: “Better planning of public transport, protecting fish from intensive fishing, and reducing the number of people killed in car accidents: for these and many other public policies, data is essential.

Data applications are diverse, and their origins are equally numerous. But data is not exclusively owned by the public sector. Data can be produced by private actors such as mobile phone operators, as part of marine traffic or by inter-connected cars to give just a few examples.

The awareness around the potential of private data is increasing, as the proliferation of data partnerships between companies, governments, local authorities show. However, these partnerships represent only a very small fraction of what could be done.

The opening of public data, meaning that public data is made freely available to everyone, has been conducted on a wide scale in the last 10 years, pioneered by the US and UK, soon followed by France and many other countries. In 2015, France took a first step, as the government introduced the Digital Republic Bill which made data open by default and introduced the concept of public interest data. Due to a broad definition and low enforcement, the opening of private sector data is, nevertheless, still lagging behind.

The main arguments for opening private data are that it will allow better public decision-making and it could trigger a new way to regulate Big Tech. There is, indeed, a strong economic case for data sharing, because data is a non-rival good: the value of data does not diminish when shared. On the contrary, new uses can be designed and data can be enriched by aggregation, which could improve innovation for start-ups….

Why Europe needs a private data act

Data hardly knows any boundaries.

Some states are opening like France did in 2015 by creating a framework for “public interest data,” but the absence of a common international legal framework for private data sharing is a major obstacle to its development. To scale up, a European Private Data Act is needed.

This framework must acknowledge the legitimate interest of the private companies that collect and control data. Data can be their main source of income or one they are wishing to develop, and this must be respected. Trade secrecy has to be protected too: data sharing is not open data.

Data can be shared to a limited and identified number of partners and it does not always have to be free. Yet private interest must be aligned with the public good. The European Convention on Human Rights and the European Charter of Fundamental Rights acknowledge that some legitimate and proportional limitations can be set to the freedom of enterprise, which gives everyone the right to pursue their own profitable business.

The “Private Data Act” should contain several fundamental data sharing principles in line with those proposed by the European Commission in 2018: proportionality, “do no harm”, full respect of the GDPR, etc. It should also include guidelines on which data to share, how to appreciate the public interest, and in which cases data should be opened for free or how the pricing should be set.

Two methods can be considered:

  • Defining high-value datasets, as it has been done for public data in the recent Open Data Directive, in areas like mobile communications, banking, transports, etc. This method is strong but is not flexible enough.
  • Alternatively, governments might define certain “public interest projects”. In doing so, governments could get access to specific data that is seen as a prerequisite to achieve the project. For example, understanding why there is a increasing mortality among bees, requires various data sources: concrete data on bee mortality from the beekeepers, crops and the use of pesticides from the farmers, weather data, etc. This method is more flexible and warrants that only the data needed for the project is shared.

Going ahead on open data and data sharing should be a priority for the upcoming European Commission and Parliament. Margrethe Vestager has been renewed as Competition Commissioner and Vice-President of the Commission and she already mentioned the opportunity to define access to data for newcomers in the digital market.

Public interest data is a new topic on the EU agenda and will probably become crucial in the near future….(More)”.

Bottom-up data Trusts: disturbing the ‘one size fits all’ approach to data governance


Sylvie Delacroix and Neil D Lawrence at International Data Privacy Law: “From the friends we make to the foods we like, via our shopping and sleeping habits, most aspects of our quotidian lives can now be turned into machine-readable data points. For those able to turn these data points into models predicting what we will do next, this data can be a source of wealth. For those keen to replace biased, fickle human decisions, this data—sometimes misleadingly—offers the promise of automated, increased accuracy. For those intent on modifying our behaviour, this data can help build a puppeteer’s strings. As we move from one way of framing data governance challenges to another, salient answers change accordingly. Just like the wealth redistribution way of framing those challenges tends to be met with a property-based, ‘it’s our data’ answer, when one frames the problem in terms of manipulation potential, dignity-based, human rights answers rightly prevail (via fairness and transparency-based answers to contestability concerns). Positive data-sharing aspirations tend to be raised within altogether different conversations from those aimed at addressing the above concerns. Our data Trusts proposal challenges these boundaries.

This article proceeds from an analysis of the very particular type of vulnerability concomitant with our ‘leaking’ data on a daily basis, to show that data ownership is both unlikely and inadequate as an answer to the problems at stake. We also argue that the current construction of top-down regulatory constraints on contractual freedom is both necessary and insufficient. To address the particular type of vulnerability at stake, bottom-up empowerment structures are needed. The latter aim to ‘give a voice’ to data subjects whose choices when it comes to data governance are often reduced to binary, ill-informed consent. While the rights granted by instruments like the GDPR can be used as tools in a bid to shape possible data-reliant futures—such as better use of natural resources, medical care, etc, their exercise is both demanding and unlikely to be as impactful when leveraged individually. As a bottom-up governance structure that is uniquely capable of taking into account the vulnerabilities outlined in the first section, we highlight the constructive potential inherent in data Trusts. This potential crosses the traditional boundaries between individualist protection concerns on one hand and collective empowerment aspirations on the other.

The second section explains how the Trust structure allows data subjects to choose to pool the rights they have over their personal data within the legal framework of a data Trust. It is important that there be a variety of data Trusts, arising out of a mix of publicly and privately funded initiatives. Each Trust will encapsulate a particular set of aspirations, reflected in the terms of the Trust. Bound by a fiduciary obligation of undivided loyalty, data trustees will exercise the data rights held under the Trust according to its particular terms. In contrast to a recently commissioned report,1 we explain why data can indeed be held in a Trust, and why the extent to which certain kinds of data may be said to give rise to property rights is neither here nor there as far as our proposal is concerned. What matters, instead, is the extent to which regulatory instruments such as the GDPR confer rights, and for what kind of data. The breadth of those rights will determine the possible scope of data Trusts in various jurisdictions.

Our ‘Case Studies’ aim to illustrate the complementarity of our data Trusts proposal with the legal provisions pertaining to different kinds of personal data, from medical, genetic, financial, and loyalty card data to social media feeds. The final section critically considers a variety of implementation challenges, which range from Trust Law’s cross-jurisdictional aspects to uptake and exit procedures, including issues related to data of shared provenance. We conclude by highlighting the way in which an ecosystem of data Trusts addresses ethical, legal, and political needs that are complementary to those within the reach of regulatory interventions such as the GDPR….(More)”.

Tracking the Labor Market with “Big Data”


Tomaz Cajner, Leland Crane, Ryan Decker, Adrian Hamins-Puertolas, and Christopher Kurz at FEDSNotes: “Payroll employment growth is one of the most reliable business cycle indicators. Each postwar recession in the United States has been characterized by a year-on-year drop in payroll employment as measured by the BLS Current Employment Statistics (CES) survey, and, outside of these recessionary declines, the year-on-year payroll employment growth has always been positive. Thus, it is not surprising that policymakers, financial markets, and the general public pay a great deal of attention to the CES payroll employment gains reported at the beginning of each month.

However, while the CES survey is one of the most carefully conducted measures of labor market activity and uses an extremely large sample, it is still subject to significant sampling error and nonsampling errors. For example, when the BLS first reported that private nonfarm payroll gains were 148,000 in July 2019, the associated 90 percent confidence interval was +/- 100,000 due to sampling error alone….

One such source of alternative labor market data is the payroll-processing company ADP, which covers 20 percent of the private workforce. These are the data that underlie ADP’s monthly National Employment Report (NER), which forecasts BLS payroll employment changes by using a combination of ADP-derived data and other publicly available data. In our research, we explore the information content of the ADP microdata alone by producing an estimate of employment changes independent from the BLS payroll series as well as from other data sources.

A potential concern when using the ADP data is that only the firms which hire ADP to manage their payrolls will appear in the data, and this may introduce sample selection issues….(More)”

Mobility Data Sharing: Challenges and Policy Recommendations


Paper by Mollie D’Agostino, Paige Pellaton, and Austin Brown: “Dynamic and responsive transportation systems are a core pillar of equitable and sustainable communities. Achieving such systems requires comprehensive mobility data, or data that reports the movement of individuals and vehicles. Such data enable planners and policymakers to make informed decisions and enable researchers to model the effects of various transportation solutions. However, collecting mobility data also raises concerns about privacy and proprietary interests.

This issue paper provides an overview of the top needs and challenges surrounding mobility data sharing and presents four relevant policy strategies: (1) Foster voluntary agreement among mobility providers for a set of standardized data specifications; (2) Develop clear data-sharing requirements designed for transportation network companies and other mobility providers; (3) Establish publicly held big-data repositories, managed by third parties, to securely hold mobility data and provide structured access by states, cities, and researchers; (4) Leverage innovative land-use and transportation-planning tools….(More)”.

Traffic Data Is Good for More than Just Streets, Sidewalks


Skip Descant at Government Technology: “The availability of highly detailed daily traffic data is clearly an invaluable resource for traffic planners, but it can also help officials overseeing natural lands or public works understand how to better manage those facilities.

The Natural Communities Coalition, a conservation nonprofit in southern California, began working with the traffic analysis firm StreetLight Data in early 2018 to study the impacts from the thousands of annual visitors to 22 parks and natural lands. StreetLight Data’s use of de-identified cellphone data held promise for the project, which will continue into early 2020.

“You start to see these increases,” Milan Mitrovich, science director for the Natural Communities Coalition, said of the uptick in visitor activity the data showed. “So being able to have this information, and share it with our executive committee… these folks, they’re seeing it for the first time.”…

Officials with the Natural Communities Coalition were able to use the StreetLight data to gain insights into patterns of use not only per day, but at different times of the day. The data also told researchers where visitors were traveling from, a detail park officials found “jaw-dropping.”

“What we were able to see is, these resources, these natural areas, cast an incredible net across southern California,” said Mitrovich, noting visitors come from not only Orange County, but Los Angeles, San Bernardino and San Diego counties as well, a region of more than 20 million residents.

The data also allows officials to predict traffic levels during certain parts of the week, times of day or even holidays….(More)”.