Why data from companies should be a common good


Paula Forteza at apolitical: “Better planning of public transport, protecting fish from intensive fishing, and reducing the number of people killed in car accidents: for these and many other public policies, data is essential.

Data applications are diverse, and their origins are equally numerous. But data is not exclusively owned by the public sector. Data can be produced by private actors such as mobile phone operators, as part of marine traffic or by inter-connected cars to give just a few examples.

The awareness around the potential of private data is increasing, as the proliferation of data partnerships between companies, governments, local authorities show. However, these partnerships represent only a very small fraction of what could be done.

The opening of public data, meaning that public data is made freely available to everyone, has been conducted on a wide scale in the last 10 years, pioneered by the US and UK, soon followed by France and many other countries. In 2015, France took a first step, as the government introduced the Digital Republic Bill which made data open by default and introduced the concept of public interest data. Due to a broad definition and low enforcement, the opening of private sector data is, nevertheless, still lagging behind.

The main arguments for opening private data are that it will allow better public decision-making and it could trigger a new way to regulate Big Tech. There is, indeed, a strong economic case for data sharing, because data is a non-rival good: the value of data does not diminish when shared. On the contrary, new uses can be designed and data can be enriched by aggregation, which could improve innovation for start-ups….

Why Europe needs a private data act

Data hardly knows any boundaries.

Some states are opening like France did in 2015 by creating a framework for “public interest data,” but the absence of a common international legal framework for private data sharing is a major obstacle to its development. To scale up, a European Private Data Act is needed.

This framework must acknowledge the legitimate interest of the private companies that collect and control data. Data can be their main source of income or one they are wishing to develop, and this must be respected. Trade secrecy has to be protected too: data sharing is not open data.

Data can be shared to a limited and identified number of partners and it does not always have to be free. Yet private interest must be aligned with the public good. The European Convention on Human Rights and the European Charter of Fundamental Rights acknowledge that some legitimate and proportional limitations can be set to the freedom of enterprise, which gives everyone the right to pursue their own profitable business.

The “Private Data Act” should contain several fundamental data sharing principles in line with those proposed by the European Commission in 2018: proportionality, “do no harm”, full respect of the GDPR, etc. It should also include guidelines on which data to share, how to appreciate the public interest, and in which cases data should be opened for free or how the pricing should be set.

Two methods can be considered:

  • Defining high-value datasets, as it has been done for public data in the recent Open Data Directive, in areas like mobile communications, banking, transports, etc. This method is strong but is not flexible enough.
  • Alternatively, governments might define certain “public interest projects”. In doing so, governments could get access to specific data that is seen as a prerequisite to achieve the project. For example, understanding why there is a increasing mortality among bees, requires various data sources: concrete data on bee mortality from the beekeepers, crops and the use of pesticides from the farmers, weather data, etc. This method is more flexible and warrants that only the data needed for the project is shared.

Going ahead on open data and data sharing should be a priority for the upcoming European Commission and Parliament. Margrethe Vestager has been renewed as Competition Commissioner and Vice-President of the Commission and she already mentioned the opportunity to define access to data for newcomers in the digital market.

Public interest data is a new topic on the EU agenda and will probably become crucial in the near future….(More)”.

Bottom-up data Trusts: disturbing the ‘one size fits all’ approach to data governance


Sylvie Delacroix and Neil D Lawrence at International Data Privacy Law: “From the friends we make to the foods we like, via our shopping and sleeping habits, most aspects of our quotidian lives can now be turned into machine-readable data points. For those able to turn these data points into models predicting what we will do next, this data can be a source of wealth. For those keen to replace biased, fickle human decisions, this data—sometimes misleadingly—offers the promise of automated, increased accuracy. For those intent on modifying our behaviour, this data can help build a puppeteer’s strings. As we move from one way of framing data governance challenges to another, salient answers change accordingly. Just like the wealth redistribution way of framing those challenges tends to be met with a property-based, ‘it’s our data’ answer, when one frames the problem in terms of manipulation potential, dignity-based, human rights answers rightly prevail (via fairness and transparency-based answers to contestability concerns). Positive data-sharing aspirations tend to be raised within altogether different conversations from those aimed at addressing the above concerns. Our data Trusts proposal challenges these boundaries.

This article proceeds from an analysis of the very particular type of vulnerability concomitant with our ‘leaking’ data on a daily basis, to show that data ownership is both unlikely and inadequate as an answer to the problems at stake. We also argue that the current construction of top-down regulatory constraints on contractual freedom is both necessary and insufficient. To address the particular type of vulnerability at stake, bottom-up empowerment structures are needed. The latter aim to ‘give a voice’ to data subjects whose choices when it comes to data governance are often reduced to binary, ill-informed consent. While the rights granted by instruments like the GDPR can be used as tools in a bid to shape possible data-reliant futures—such as better use of natural resources, medical care, etc, their exercise is both demanding and unlikely to be as impactful when leveraged individually. As a bottom-up governance structure that is uniquely capable of taking into account the vulnerabilities outlined in the first section, we highlight the constructive potential inherent in data Trusts. This potential crosses the traditional boundaries between individualist protection concerns on one hand and collective empowerment aspirations on the other.

The second section explains how the Trust structure allows data subjects to choose to pool the rights they have over their personal data within the legal framework of a data Trust. It is important that there be a variety of data Trusts, arising out of a mix of publicly and privately funded initiatives. Each Trust will encapsulate a particular set of aspirations, reflected in the terms of the Trust. Bound by a fiduciary obligation of undivided loyalty, data trustees will exercise the data rights held under the Trust according to its particular terms. In contrast to a recently commissioned report,1 we explain why data can indeed be held in a Trust, and why the extent to which certain kinds of data may be said to give rise to property rights is neither here nor there as far as our proposal is concerned. What matters, instead, is the extent to which regulatory instruments such as the GDPR confer rights, and for what kind of data. The breadth of those rights will determine the possible scope of data Trusts in various jurisdictions.

Our ‘Case Studies’ aim to illustrate the complementarity of our data Trusts proposal with the legal provisions pertaining to different kinds of personal data, from medical, genetic, financial, and loyalty card data to social media feeds. The final section critically considers a variety of implementation challenges, which range from Trust Law’s cross-jurisdictional aspects to uptake and exit procedures, including issues related to data of shared provenance. We conclude by highlighting the way in which an ecosystem of data Trusts addresses ethical, legal, and political needs that are complementary to those within the reach of regulatory interventions such as the GDPR….(More)”.

Tracking the Labor Market with “Big Data”


Tomaz Cajner, Leland Crane, Ryan Decker, Adrian Hamins-Puertolas, and Christopher Kurz at FEDSNotes: “Payroll employment growth is one of the most reliable business cycle indicators. Each postwar recession in the United States has been characterized by a year-on-year drop in payroll employment as measured by the BLS Current Employment Statistics (CES) survey, and, outside of these recessionary declines, the year-on-year payroll employment growth has always been positive. Thus, it is not surprising that policymakers, financial markets, and the general public pay a great deal of attention to the CES payroll employment gains reported at the beginning of each month.

However, while the CES survey is one of the most carefully conducted measures of labor market activity and uses an extremely large sample, it is still subject to significant sampling error and nonsampling errors. For example, when the BLS first reported that private nonfarm payroll gains were 148,000 in July 2019, the associated 90 percent confidence interval was +/- 100,000 due to sampling error alone….

One such source of alternative labor market data is the payroll-processing company ADP, which covers 20 percent of the private workforce. These are the data that underlie ADP’s monthly National Employment Report (NER), which forecasts BLS payroll employment changes by using a combination of ADP-derived data and other publicly available data. In our research, we explore the information content of the ADP microdata alone by producing an estimate of employment changes independent from the BLS payroll series as well as from other data sources.

A potential concern when using the ADP data is that only the firms which hire ADP to manage their payrolls will appear in the data, and this may introduce sample selection issues….(More)”

Mobility Data Sharing: Challenges and Policy Recommendations


Paper by Mollie D’Agostino, Paige Pellaton, and Austin Brown: “Dynamic and responsive transportation systems are a core pillar of equitable and sustainable communities. Achieving such systems requires comprehensive mobility data, or data that reports the movement of individuals and vehicles. Such data enable planners and policymakers to make informed decisions and enable researchers to model the effects of various transportation solutions. However, collecting mobility data also raises concerns about privacy and proprietary interests.

This issue paper provides an overview of the top needs and challenges surrounding mobility data sharing and presents four relevant policy strategies: (1) Foster voluntary agreement among mobility providers for a set of standardized data specifications; (2) Develop clear data-sharing requirements designed for transportation network companies and other mobility providers; (3) Establish publicly held big-data repositories, managed by third parties, to securely hold mobility data and provide structured access by states, cities, and researchers; (4) Leverage innovative land-use and transportation-planning tools….(More)”.

Traffic Data Is Good for More than Just Streets, Sidewalks


Skip Descant at Government Technology: “The availability of highly detailed daily traffic data is clearly an invaluable resource for traffic planners, but it can also help officials overseeing natural lands or public works understand how to better manage those facilities.

The Natural Communities Coalition, a conservation nonprofit in southern California, began working with the traffic analysis firm StreetLight Data in early 2018 to study the impacts from the thousands of annual visitors to 22 parks and natural lands. StreetLight Data’s use of de-identified cellphone data held promise for the project, which will continue into early 2020.

“You start to see these increases,” Milan Mitrovich, science director for the Natural Communities Coalition, said of the uptick in visitor activity the data showed. “So being able to have this information, and share it with our executive committee… these folks, they’re seeing it for the first time.”…

Officials with the Natural Communities Coalition were able to use the StreetLight data to gain insights into patterns of use not only per day, but at different times of the day. The data also told researchers where visitors were traveling from, a detail park officials found “jaw-dropping.”

“What we were able to see is, these resources, these natural areas, cast an incredible net across southern California,” said Mitrovich, noting visitors come from not only Orange County, but Los Angeles, San Bernardino and San Diego counties as well, a region of more than 20 million residents.

The data also allows officials to predict traffic levels during certain parts of the week, times of day or even holidays….(More)”.

Gender Gaps in Urban Mobility


Brief of the Data 2X Big Data and Gender Brief Series by The GovLab, UNICEF, Universidad Del Desarrollo, Telefónica R&D Center, ISI Foundation, and DigitalGlobe: “Mobility is gendered. For example, the household division of labor in many societies leads women and girls to take more multi-purpose, multi-stop trips than men. Women-headed households also tend to work more in the informal sector, with limited access to transportation subsidies, and use of public transit is further reduced by the risk of violence in public spaces.

This brief summarizes a recent analysis of gendered urban mobility in 51 (out of 52) neighborhoods of Santiago, Chile, relying on the call detail records (CDRs) of a large sample of mobile phone users over a period of three months. We found that: 1) women move less overall than men; 2) have a smaller radius of movement; and 3) tend to concentrate their time in a smaller set of locations. These mobility gaps are linked to lower average incomes and fewer public and private transportation options. These insights, taken from large volumes of passively generated, inexpensive data streaming in realtime, can help policymakers design more gender inclusive urban transit systems….(More)”.

Guide to Mobile Data Analytics in Refugee Scenarios


Book edited Albert Ali Salah, Alex Pentland, Bruno Lepri and Emmanuel Letouzé: “After the start of the Syrian Civil War in 2011–12, increasing numbers of civilians sought refuge in neighboring countries. By May 2017, Turkey had received over 3 million refugees — the largest r efugee population in the world. Some lived in government-run camps near the Syrian border, but many have moved to cities looking for work and better living conditions. They faced problems of integration, income, welfare, employment, health, education, language, social tension, and discrimination. In order to develop sound policies to solve these interlinked problems, a good understanding of refugee dynamics is necessary.

This book summarizes the most important findings of the Data for Refugees (D4R) Challenge, which was a non-profit project initiated to improve the conditions of the Syrian refugees in Turkey by providing a database for the scientific community to enable research on urgent problems concerning refugees. The database, based on anonymized mobile call detail records (CDRs) of phone calls and SMS messages of one million Turk Telekom customers, indicates the broad activity and mobility patterns of refugees and citizens in Turkey for the year 1 January to 31 December 2017. Over 100 teams from around the globe applied to take part in the challenge, and 61 teams were granted access to the data.

This book describes the challenge, and presents selected and revised project reports on the five major themes: unemployment, health, education, social integration, and safety, respectively. These are complemented by additional invited chapters describing related projects from international governmental organizations, technological infrastructure, as well as ethical aspects. The last chapter includes policy recommendations, based on the lessons learned.

The book will serve as a guideline for creating innovative data-centered collaborations between industry, academia, government, and non-profit humanitarian agencies to deal with complex problems in refugee scenarios. It illustrates the possibilities of big data analytics in coping with refugee crises and humanitarian responses, by showcasing innovative approaches drawing on multiple data sources, information visualization, pattern analysis, and statistical analysis.It will also provide researchers and students working with mobility data with an excellent coverage across data science, economics, sociology, urban computing, education, migration studies, and more….(More)”.

Data-Sharing in IoT Ecosystems From a Competition Law Perspective: The Example of Connected Cars


Paper by Wolfgang Kerber: “…analyses whether competition law can help to solve problems of access to data and interoperability in IoT ecosystems, where often one firm has exclusive control of the data produced by a smart device (and of the technical access to this device). Such a gatekeeper position can lead to the elimination of competition for aftermarket and other complementary services in such IoT ecosystems. This problem is analysed both from an economic and a legal perspective, and also generally for IoT ecosystems as well as for the much discussed problems of “access to in-vehicle data and re-sources” in connected cars, where the “extended vehicle” concept of the car manufacturers leads to such positions of exclusive control. The paper analyses, in particular, the competition rules about abusive behavior of dominant firms (Art. 102 TFEU) and of firms with “relative market power” (§ 20 (1) GWB) in German competition law. These provisions might offer (if appropriately applied and amended) at least some solutions for these data access problems. Competition law, however, might not be sufficient for dealing with all or most of these problems, i.e. that also additional solutions might be needed (data portability, direct data (access) rights, or sector-specific regulation)….(More)”.

How Should Scientists’ Access To Health Databanks Be Managed?


Richard Harris at NPR: “More than a million Americans have donated genetic information and medical data for research projects. But how that information gets used varies a lot, depending on the philosophy of the organizations that have gathered the data.

Some hold the data close, while others are working to make the data as widely available to as many researchers as possible — figuring science will progress faster that way. But scientific openness can be constrained b y both practical and commercial considerations.

Three major projects in the United States illustrate these differing philosophies.

VA scientists spearhead research on veterans database

The first project involves three-quarters of a million veterans, mostly men over age 60. Every day, 400 to 500 blood samples show up in a modern lab in the basement of the Veterans Affairs hospital in Boston. Luis Selva, the center’s associate director, explains that robots extract DNA from the samples and then the genetic material is sent out for analysis….

Intermountain Healthcare teams with deCODE genetics

Our second example involves what is largely an extended family: descendants of settlers in Utah, primarily from the Church of Jesus Christ of Latter-day Saints. This year, Intermountain Healthcare in Utah announced that it was going to sequence the complete DNA of half a million of its patients, resulting in what the health system says will be the world’s largest collection of complete genomes….

NIH’s All of Us aims to diversify and democratize research

Our third and final example is an effort by the National Institutes of Health to recruit a million Americans for a long-term study of health, behavior and genetics. Its philosophy sharply contrasts with that of Intermountain Health.

“We do have a very strong goal around diversity, in making sure that the participants in the All of Us research program reflect the vast diversity of the United States,” says Stephanie Devaney, the program’s deputy director….(More)”.

Raw data won’t solve our problems — asking the right questions will


Stefaan G. Verhulst in apolitical: “If I had only one hour to save the world, I would spend fifty-five minutes defining the questions, and only five minutes finding the answers,” is a famous aphorism attributed to Albert Einstein.

Behind this quote is an important insight about human nature: Too often, we leap to answers without first pausing to examine our questions. We tout solutions without considering whether we are addressing real or relevant challenges or priorities. We advocate fixes for problems, or for aspects of society, that may not be broken at all.

This misordering of priorities is especially acute — and represents a missed opportunity — in our era of big data. Today’s data has enormous potential to solve important public challenges.

However, policymakers often fail to invest in defining the questions that matter, focusing mainly on the supply side of the data equation (“What data do we have or must have access to?”) rather than the demand side (“What is the core question and what data do we really need to answer it?” or “What data can or should we actually use to solve those problems that matter?”).

As such, data initiatives often provide marginal insights while at the same time generating unnecessary privacy risks by accessing and exploring data that may not in fact be needed at all in order to address the root of our most important societal problems.

A new science of questions

So what are the truly vexing questions that deserve attention and investment today? Toward what end should we strategically seek to leverage data and AI?

The truth is that policymakers and other stakeholders currently don’t have a good way of defining questions or identifying priorities, nor a clear framework to help us leverage the potential of data and data science toward the public good.

This is a situation we seek to remedy at The GovLab, an action research center based at New York University.

Our most recent project, the 100 Questions Initiative, seeks to begin developing a new science and practice of questions — one that identifies the most urgent questions in a participatory manner. Launched last month, the goal of this project is to develop a process that takes advantage of distributed and diverse expertise on a range of given topics or domains so as to identify and prioritize those questions that are high impact, novel and feasible.

Because we live in an age of data and much of our work focuses on the promises and perils of data, we seek to identify the 100 most pressing problems confronting the world that could be addressed by greater use of existing, often inaccessible, datasets through data collaboratives – new forms of cross-disciplinary collaboration beyond public-private partnerships focused on leveraging data for good….(More)”.