Introducing the Contractual Wheel of Data Collaboration


Blog by Andrew Young and Stefaan Verhulst: “Earlier this year we launched the Contracts for Data Collaboration (C4DC) initiative — an open collaborative with charter members from The GovLab, UN SDSN Thematic Research Network on Data and Statistics (TReNDS), University of Washington and the World Economic Forum. C4DC seeks to address the inefficiencies of developing contractual agreements for public-private data collaboration by informing and guiding those seeking to establish a data collaborative by developing and making available a shared repository of relevant contractual clauses taken from existing legal agreements. Today TReNDS published “Partnerships Founded on Trust,” a brief capturing some initial findings from the C4DC initiative.

The Contractual Wheel of Data Collaboration [beta]

The Contractual Wheel of Data Collaboration [beta] — Stefaan G. Verhulst and Andrew Young, The GovLab

As part of the C4DC effort, and to support Data Stewards in the private sector and decision-makers in the public and civil sectors seeking to establish Data Collaboratives, The GovLab developed the Contractual Wheel of Data Collaboration [beta]. The Wheel seeks to capture key elements involved in data collaboration while demystifying contracts and moving beyond the type of legalese that can create confusion and barriers to experimentation.

The Wheel was developed based on an assessment of existing legal agreements, engagement with The GovLab-facilitated Data Stewards Network, and analysis of the key elements of our Data Collaboratives Methodology. It features 22 legal considerations organized across 6 operational categories that can act as a checklist for the development of a legal agreement between parties participating in a Data Collaborative:…(More)”.

San Francisco teams up with Uber, location tracker on 911 call responses


Gwendolyn Wu at San Francisco Chronicle: “In an effort to shorten emergency response times in San Francisco, the city announced on Monday that it is now using location data from RapidSOS, a New York-based public safety tech company, and ride-hailing company Uber to improve location coordinates generated from 911 calls.

An increasing amount of emergency calls are made from cell phones, said Michelle Cahn, RapidSOS’s director of community engagement. The new technology should allow emergency responders to narrow down the location of such callers and replace existing 911 technology that was built for landlines and tied to home addresses.

Cell phone location data currently given to dispatchers when they receive a 911 call can be vague, especially if the person can’t articulate their exact location, according to the Department of Emergency Management.

But if a dispatcher can narrow down where the emergency is happening, that increases the chance of a timely response and better result, Cahn said.

“It doesn’t matter what’s going on with the emergency if we don’t know where it is,” she said.

RapidSOS shares its location data — collected by Apple and Google for their in-house map apps — free of charge to public safety agencies. San Francisco’s 911 call center adopted the data service in September 2018.

The Federal Communications Commission estimates agencies could save as many as 10,000 lives a year if they shave a minute off response times. Federal officials issued new rules to improve wireless 911 calls in 2015, asking mobile carriers to provide more accurate locations to call centers. Carriers are required to find a way to triangulate the caller’s location within 50 meters — a much smaller radius than the eight blocks city officials were initially presented in October when the caller dialed 911…(More)”.

Characterizing the Biomedical Data-Sharing Landscape


Paper by Angela G. Villanueva et al: “Advances in technologies and biomedical informatics have expanded capacity to generate and share biomedical data. With a lens on genomic data, we present a typology characterizing the data-sharing landscape in biomedical research to advance understanding of the key stakeholders and existing data-sharing practices. The typology highlights the diversity of data-sharing efforts and facilitators and reveals how novel data-sharing efforts are challenging existing norms regarding the role of individuals whom the data describe.

Technologies such as next-generation sequencing have dramatically expanded capacity to generate genomic data at a reasonable cost, while advances in biomedical informatics have created new tools for linking and analyzing diverse data types from multiple sources. Further, many research-funding agencies now mandate that grantees share data. The National Institutes of Health’s (NIH) Genomic Data Sharing (GDS) Policy, for example, requires NIH-funded research projects generating large-scale human genomic data to share those data via an NIH-designated data repository such as the Database of Geno-types and Phenotypes (dbGaP). Another example is the Parent Project Muscular Dystrophy, a non-profit organization that requires applicants to propose a data-sharing plan and take into account an applicant’s history of data sharing.

The flow of data to and from different projects, institutions, and sectors is creating a medical information commons (MIC), a data-sharing ecosystem consisting of networked resources sharing diverse health-related data from multiple sources for research and clinical uses. This concept aligns with the 2018 NIH Strategic Plan for Data Science, which uses the term “data ecosystem” to describe “a distributed, adaptive, open system with properties of self-organization, scalability and sustainability” and proposes to “modernize the biomedical research data ecosystem” by funding projects such as the NIH Data Commons. Consistent with Elinor Ostrom’s discussion of nested institutional arrangements, an MIC is both singular and plural and may describe the ecosystem as a whole or individual components contributing to the ecosystem. Thus, resources like the NIH Data Commons with its associated institutional arrangements are MICs, and also form part of the larger MIC that encompasses all such resources and arrangements.

Although many research funders incentivize data sharing, in practice, progress in making biomedical data broadly available to maximize its utility is often hampered by a broad range of technical, legal, cultural, normative, and policy challenges that include achieving interoperability, changing the standards for academic promotion, and addressing data privacy and security concerns. Addressing these challenges requires multi-stakeholder involvement. To identify relevant stakeholders and advance understanding of the contributors to an MIC, we conducted a landscape analysis of existing data-sharing efforts and facilitators. Our work builds on typologies describing various aspects of data sharing that focused on biobanks, research consortia, or where data reside (e.g., degree of data centralization).7 While these works are informative, we aimed to capture the biomedical data-sharing ecosystem with a wider scope. Understanding the components of an MIC ecosystem and how they interact, and identifying emerging trends that test existing norms (such as norms respecting the role of individuals from whom the data describe), is essential to fostering effective practices, policies and governance structures, guiding resource allocation, and promoting the overall sustainability of the MIC….(More)”

The Importance of Data Access Regimes for Artificial Intelligence and Machine Learning


JRC Digital Economy Working Paper by Bertin Martens: “Digitization triggered a steep drop in the cost of information. The resulting data glut created a bottleneck because human cognitive capacity is unable to cope with large amounts of information. Artificial intelligence and machine learning (AI/ML) triggered a similar drop in the cost of machine-based decision-making and helps in overcoming this bottleneck. Substantial change in the relative price of resources puts pressure on ownership and access rights to these resources. This explains pressure on access rights to data. ML thrives on access to big and varied datasets. We discuss the implications of access regimes for the development of AI in its current form of ML. The economic characteristics of data (non-rivalry, economies of scale and scope) favour data aggregation in big datasets. Non-rivalry implies the need for exclusive rights in order to incentivise data production when it is costly. The balance between access and exclusion is at the centre of the debate on data regimes. We explore the economic implications of several modalities for access to data, ranging from exclusive monopolistic control to monopolistic competition and free access. Regulatory intervention may push the market beyond voluntary exchanges, either towards more openness or reduced access. This may generate private costs for firms and individuals. Society can choose to do so if the social benefits of this intervention outweigh the private costs.

We briefly discuss the main EU legal instruments that are relevant for data access and ownership, including the General Data Protection Regulation (GDPR) that defines the rights of data subjects with respect to their personal data and the Database Directive (DBD) that grants ownership rights to database producers. These two instruments leave a wide legal no-man’s land where data access is ruled by bilateral contracts and Technical Protection Measures that give exclusive control to de facto data holders, and by market forces that drive access, trade and pricing of data. The absence of exclusive rights might facilitate data sharing and access or it may result in a segmented data landscape where data aggregation for ML purposes is hard to achieve. It is unclear if incompletely specified ownership and access rights maximize the welfare of society and facilitate the development of AI/ML…(More)”

Data Trusts: More Data than Trust? The Perspective of the Data Subject in the Face of a Growing Problem


Paper by Christine Rinik: “In the recent report, Growing the Artificial Intelligence Industry in the UK, Hall and Pesenti suggest the use of a ‘data trust’ to facilitate data sharing. Whilst government and corporations are focusing on their need to facilitate data sharing, the perspective of many individuals is that too much data is being shared. The issue is not only about data, but about power. The individual does not often have a voice when issues relating to data sharing are tackled. Regulators can cite the ‘public interest’ when data governance is discussed, but the individual’s interests may diverge from that of the public.

This paper considers the data subject’s position with respect to data collection leading to considerations about surveillance and datafication. Proposals for data trusts will be considered applying principles of English trust law to possibly mitigate the imbalance of power between large data users and individual data subjects. Finally, the possibility of a workable remedy in the form of a class action lawsuit which could give the data subjects some collective power in the event of a data breach will be explored. Despite regulatory efforts to protect personal data, there is a lack of public trust in the current data sharing system….(More)”.

Data Collaboratives as an enabling infrastructure for AI for Good


Blog Post by Stefaan G. Verhulst: “…The value of data collaboratives stems from the fact that the supply of and demand for data are generally widely dispersed — spread across government, the private sector, and civil society — and often poorly matched. This failure (a form of “market failure”) results in tremendous inefficiencies and lost potential. Much data that is released is never used. And much data that is actually needed is never made accessible to those who could productively put it to use.

Data collaboratives, when designed responsibly, are the key to addressing this shortcoming. They draw together otherwise siloed data and a dispersed range of expertise, helping match supply and demand, and ensuring that the correct institutions and individuals are using and analyzing data in ways that maximize the possibility of new, innovative social solutions.

Roadmap for Data Collaboratives

Despite their clear potential, the evidence base for data collaboratives is thin. There’s an absence of a systemic, structured framework that can be replicated across projects and geographies, and there’s a lack of clear understanding about what works, what doesn’t, and how best to maximize the potential of data collaboratives.

At the GovLab, we’ve been working to address these shortcomings. For emerging economies considering the use of data collaboratives, whether in pursuit of Artificial Intelligence or other solutions, we present six steps that can be considered in order to create data collaborative that are more systematic, sustainable, and responsible.

The need for making Data Collaboratives Systematic, Sustainable and Responsible
  • Increase Evidence and Awareness
  • Increase Readiness and Capacity
  • Address Data Supply and Demand Inefficiencies and Uncertainties
  • Establish a New “Data Stewards” Function
  • Develop and strengthen policies and governance practices for data collaboration

Digital Data for Development


LinkedIn: “The World Bank Group and LinkedIn share a commitment to helping workers around the world access opportunities that make good use of their talents and skills. The two organizations have come together to identify new ways that data from LinkedIn can help inform policymakers who seek to boost employment and grow their economies.

This site offers data and automated visuals of industries where LinkedIn data is comprehensive enough to provide an emerging picture. The data complements a wealth of official sources and can offer a more real-time view in some areas particularly for new, rapidly changing digital and technology industries.

The data shared in the first phase of this collaboration focuses on 100+ countries with at least 100,000 LinkedIn members each, distributed across 148 industries and 50,000 skills categories. In the near term, it will help World Bank Group teams and government partners pinpoint ways that developing countries could stimulate growth and expand opportunity, especially as disruptive technologies reshape the economic landscape. As LinkedIn’s membership and digital platforms continue to grow in developing countries, this collaboration will assess the possibility to expand the sectors and countries covered in the next annual update.

This site offers downloadable data, visualizations, and an expanding body of insights and joint research from the World Bank Group and LinkedIn. The data is being made accessible as a public good, though it will be most useful for policy analysts, economists, and researchers….(More)”.

Predictive Big Data Analytics using the UK Biobank Data


Paper by Ivo D Dinov et al: “The UK Biobank is a rich national health resource that provides enormous opportunities for international researchers to examine, model, and analyze census-like multisource healthcare data. The archive presents several challenges related to aggregation and harmonization of complex data elements, feature heterogeneity and salience, and health analytics. Using 7,614 imaging, clinical, and phenotypic features of 9,914 subjects we performed deep computed phenotyping using unsupervised clustering and derived two distinct sub-cohorts. Using parametric and nonparametric tests, we determined the top 20 most salient features contributing to the cluster separation. Our approach generated decision rules to predict the presence and progression of depression or other mental illnesses by jointly representing and modeling the significant clinical and demographic variables along with the derived salient neuroimaging features. We reported consistency and reliability measures of the derived computed phenotypes and the top salient imaging biomarkers that contributed to the unsupervised clustering. This clinical decision support system identified and utilized holistically the most critical biomarkers for predicting mental health, e.g., depression. External validation of this technique on different populations may lead to reducing healthcare expenses and improving the processes of diagnosis, forecasting, and tracking of normal and pathological aging….(More)”.

Statistics Estonia to coordinate data governance


Article by Miriam van der Sangen at CBS: “In 2018, Statistics Estonia launched a new strategy for the period 2018-2022. This strategy addresses the organisation’s aim to produce statistics more quickly while minimising the response burden on both businesses and citizens. Another element in the strategy is addressing the high expectations in Estonian society regarding the use of data. ‘We aim to transform Statistics Estonia into a national data agency,’ says Director General Mägi. ‘This means our role as a producer of official statistics will be enlarged by data governance responsibilities in the public sector. Taking on such responsibilities requires a clear vision of the whole public data ecosystem and also agreement to establish data stewards in most public sector institutions.’…

the Estonian Parliament passed new legislation that effectively expanded the number of official tasks for Statistics Estonia. Mägi elaborates: ‘Most importantly, we shall be responsible for coordinating data governance. The detailed requirements and conditions of data governance will be specified further in the coming period.’ Under the new Act, Statistics Estonia will also have more possibilities to share data with other parties….

Statistics Estonia is fully committed to producing statistics which are based on big data. Mägi explains: ‘At the moment, we are actively working on two big data projects. One project involves the use of smart electricity meters. In this project, we are looking into ways to visualise business and household electricity consumption information. The second project involves web scraping of prices and enterprise characteristics. This project is still in an initial phase, but we can already see that the use of web scraping can improve the efficiency of our production process.’ We are aiming to extend the web scraping project by also identifying e-commerce and innovation activities of enterprises.’

Yet another ambitious goal for Statistics Estonia lies in the field of data science. ‘Similarly to Statistics Netherlands, we established experimental statistics and data mining activities years ago. Last year, we developed a so-called think-tank service, providing insights from data into all aspects of our lives. Think of birth, education, employment, et cetera. Our key clients are the various ministries, municipalities and the private sector. The main aim in the coming years is to speed up service time thanks to visualisations and data lake solutions.’ …(More)”.

Facebook’s AI team maps the whole population of Africa


Devin Coldewey at TechCrunch: “A new map of nearly all of Africa shows exactly where the continent’s 1.3 billion people live, down to the meter, which could help everyone from local governments to aid organizations. The map joins others like it from Facebook  created by running satellite imagery through a machine learning model.

It’s not exactly that there was some mystery about where people live, but the degree of precision matters. You may know that a million people live in a given region, and that about half are in the bigger city and another quarter in assorted towns. But that leaves hundreds of thousands only accounted for in the vaguest way.

Fortunately, you can always inspect satellite imagery and pick out the spots where small villages and isolated houses and communities are located. The only problem is that Africa is big. Really big. Manually labeling the satellite imagery even from a single mid-sized country like Gabon or Malawi would take a huge amount of time and effort. And for many applications of the data, such as coordinating the response to a natural disaster or distributing vaccinations, time lost is lives lost.

Better to get it all done at once then, right? That’s the idea behind Facebook’s Population Density Maps project, which had already mapped several countries over the last couple of years before the decision was made to take on the entire African continent….

“The maps from Facebook ensure we focus our volunteers’ time and resources on the places they’re most needed, improving the efficacy of our programs,” said Tyler Radford, executive director of the Humanitarian OpenStreetMap Team, one of the project’s partners.

The core idea is straightforward: Match census data (how many people live in a region) with structure data derived from satellite imagery to get a much better idea of where those people are located.

“With just the census data, the best you can do is assume that people live everywhere in the district – buildings, fields, and forests alike,” said Facebook engineer James Gill. “But once you know the building locations, you can skip the fields and forests and only allocate the population to the buildings. This gives you very detailed 30 meter by 30 meter population maps.”

That’s several times more accurate than any extant population map of this size. The analysis is done by a machine learning agent trained on OpenStreetMap data from all over the world, where people have labeled and outlined buildings and other features.

First the huge amount of Africa’s surface that obviously has no structure had to be removed from consideration, reducing the amount of space the team had to evaluate by a factor of a thousand or more. Then, using a region-specific algorithm (because things look a lot different in coastal Morocco than they do in central Chad), the model identifies patches that contain a building….(More)”.