How to craft fair, transparent data-sharing agreements


Article by Stephanie Kanowitz: “Data collaborations are critical to government decision-making, but actually sharing data can be difficult—not so much the mechanics of the collaboration, but hashing out the rules and policies governing it. A new report offers three resources that will make data sharing more straightforward, foster accountability and build trust among the parties.

“We’ve heard over and over again that one of the biggest barriers to collaboration around data turns out to be data sharing agreements,” said Stefaan Verhulst, co-founder of the Governance Lab at New York University and an author of the November report, “Moving from Idea to Practice.” It’s sometimes a lot to ask stakeholders “to provide access to some of their data,” he said.

To help, Verhulst and other researchers identified three components of successful data-sharing agreements: conducting principled negotiations, establishing the elements of a data-sharing agreement and assessing readiness.

To address the first, the report breaks the components of negotiation into a framework with four tenets: separating people from the problem, focusing on interests rather than positions, identifying options and using objective criteria. From discussions with stakeholders in data sharing agreement workshops that GovLab held through its Open Data Policy Lab, three principles emerged—fairness, transparency and reciprocity…(More)”.

Measuring Global Migration: Towards Better Data for All


Book by Frank Laczko, Elisa Mosler Vidal, Marzia Rango: “This book focuses on how to improve the collection, analysis and responsible use of data on global migration and international mobility. While migration remains a topic of great policy interest for governments around the world, there is a serious lack of reliable, timely, disaggregated and comparable data on it, and often insufficient safeguards to protect migrants’ information. Meanwhile, vast amounts of data about the movement of people are being generated in real time due to new technologies, but these have not yet been fully captured and utilized by migration policymakers, who often do not have enough data to inform their policies and programmes. The lack of migration data has been internationally recognized; the Global Compact for Safe, Orderly and Regular Migration urges all countries to improve data on migration to ensure that policies and programmes are “evidence-based”, but does not spell out how this could be done.

This book examines both the technical issues associated with improving data on migration and the wider political challenges of how countries manage the collection and use of migration data. The first part of the book discusses how much we really know about international migration based on existing data, and key concepts and approaches which are often used to measure migration. The second part of the book examines what measures could be taken to improve migration data, highlighting examples of good practice from around the world in recent years, across a range of different policy areas, such as health, climate change and sustainable development more broadly.

Written by leading experts on international migration data, this book is the perfect guide for students, policymakers and practitioners looking to understand more about the existing evidence base on migration and what can be done to improve it…(More)”. (See also: Big Data For Migration Alliance).

Public Value of Data: B2G data-sharing Within the Data Ecosystem of Helsinki


Paper by Vera Djakonoff: “Datafication penetrates all levels of society. In order to harness public value from an expanding pool of private-produced data, there has been growing interest in facilitating business-to-government (B2G) data-sharing. This research examines the development of B2G data-sharing within the data ecosystem of the City of Helsinki. The research has identified expectations ecosystem actors have for B2G data-sharing and factors that influence the city’s ability to unlock public value from private-produced data.

The research context is smart cities, with a specific focus on the City of Helsinki. Smart cities are in an advantageous position to develop novel public-private collaborations. Helsinki, on the international stage, stands out as a pioneer in the realm of data-driven smart city development. For this research, nine data ecosystem actors representing the city and companies participated in semi-structured thematic interviews through which their perceptions and experiences were mapped.

The theoretical framework of this research draws from the public value management (PVM) approach in examining the smart city data ecosystem and alignment of diverse interests for a shared purpose. Additionally, the research transcends the examination of the interests in isolation and looks at how technological artefacts shape the social context and interests surrounding them. Here, the focus is on the properties of data as an artefact with anti-rival value-generation potential.

The findings of this research reveal that while ecosystem actors recognise that more value can be drawn from data through collaboration, this is not apparent at the level of individual initiatives and transactions. This research shows that the city’s commitment to and facilitation of a long-term shared sense of direction and purpose among ecosystem actors is central to developing B2G data-sharing for public value outcomes. Here, participatory experimentation is key, promoting an understanding of the value of data and rendering visible the diverse motivations and concerns of ecosystem actors, enabling learning for wise, data-driven development…(More)”.

Elon Musk is now taking applications for data to study X — but only EU risk researchers need apply…


Article by Natasha Lomas: “Lawmakers take note: Elon Musk-owned X appears to have quietly complied with a hard legal requirement in the European Union that requires larger platforms (aka VLOPs) to provide researchers with data access in order to study systemic risks arising from use of their services — risks such as disinformation, child safety issues, gender-based violence and mental heath concerns.

X (or Twitter as it was still called at the time) was designated a VLOP under the EU’s Digital Services Act (DSA) back in April after the bloc’s regulators confirmed it meets their criteria for an extra layer of rules to kick in that are intended to drive algorithmic accountability via applying transparency measures on larger platforms.

Researchers intending to study systemic risks in the EU now appear to at least be able to apply for access to study X’s data by accessing a web form through a button which appears at the bottom of this page on its developer platform. (Note researchers can be based in the EU but don’t have to be to meet the criteria; they just need to intend to study systemic risks in the EU.)…(More)”.

Meta is giving researchers more access to Facebook and Instagram data


Article by Tate Ryan-Mosley: “Meta is releasing a new transparency product called the Meta Content Library and API, according to an announcement from the company today. The new tools will allow select researchers to access publicly available data on Facebook and Instagram in an effort to give a more overarching view of what’s happening on the platforms. 

The move comes as social media companies are facing public and regulatory pressure to increase transparency about how their products—specifically recommendation algorithms—work and what impact they have. Academic researchers have long been calling for better access to data from social media platforms, including Meta. This new library is a step toward increased visibility about what is happening on its platforms and the effect that Meta’s products have on online conversations, politics, and society at large. 

In an interview, Meta’s president of global affairs, Nick Clegg, said the tools “are really quite important” in that they provide, in a lot of ways, “the most comprehensive access to publicly available content across Facebook and Instagram of anything that we’ve built to date.” The Content Library will also help the company meet new regulatory requirements and obligations on data sharing and transparency, as the company notes in a blog post Tuesday

The library and associated API were first released as a beta version several months ago and allow researchers to access near-real-time data about pages, posts, groups, and events on Facebook and creator and business accounts on Instagram, as well as the associated numbers of reactions, shares, comments, and post view counts. While all this data is publicly available—as in, anyone can see public posts, reactions, and comments on Facebook—the new library makes it easier for researchers to search and analyze this content at scale…(More)”.

Policy primer on non-personal data 


Primer by the International Chamber of Commerce: “Non-personal data plays a critical role in providing solutions to global challenges. Unlocking its full potential requires policymakers, businesses, and all other stakeholders to collaborate to construct policy environments that can capitalise on its benefits.  

This report gives insights into the different ways that non-personal data has a positive impact on society, with benefits including, but not limited to: 

  1. Tracking disease outbreaks; 
  2. Facilitating international scientific cooperation; 
  3. Understanding climate-related trends; 
  4.  Improving agricultural practices for increased efficiency; 
  5. Optimising energy consumption; 
  6. Developing evidence-based policy; 
  7. Enhancing cross-border cybersecurity cooperation. 

In addition, businesses of all sizes benefit from the transfer of data across borders, allowing companies to establish and maintain international supply chains and smaller businesses to enter new markets or reduce operating costs. 

Despite these benefits, international flows of non-personal data are frequently limited by restrictions and data localisation measures. A growing patchwork of regulations can also create barriers to realising the potential of non-personal data. This report explores the impact of data flow restrictions including: 

  • Hindering global supply chains; 
  • Limiting the use of AI reliant on large datasets; 
  • Disincentivising data sharing amongst companies; 
  • Preventing companies from analysing the data they hold…(More)”.

Indigenous Peoples and Local Communities Are Using Satellite Data to Fight Deforestation


Article by Katie Reytar, Jessica Webb and Peter Veit: “Indigenous Peoples and local communities hold some of the most pristine and resource-rich lands in the world — areas highly coveted by mining and logging companies and other profiteers.  Land grabs and other threats are especially severe in places where the government does not recognize communities’ land rights, or where anti-deforestation and other laws are weak or poorly enforced. It’s the reason many Indigenous Peoples and local communities often take land monitoring into their own hands — and some are now using digital tools to do it. 

Freely available satellite imagery and data from sites like Global Forest Watch and LandMark provide near-real-time information that tracks deforestation and land degradation. Indigenous and local communities are increasingly using tools like this to gather evidence that deforestation and degradation are happening on their lands, build their case against illegal activities and take legal action to prevent it from continuing.  

Three examples from Suriname, Indonesia and Peru illustrate a growing trend in fighting land rights violations with data…(More)”.

The Time is Now: Establishing a Mutual Commitment Framework (MCF) to Accelerate Data Collaboratives


Article by Stefaan Verhulst, Andrew Schroeder and William Hoffman: “The key to unlocking the value of data lies in responsibly lowering the barriers and shared risks of data access, re-use, and collaboration in the public interest. Data collaboratives, which foster responsible access and re-use of data among diverse stakeholders, provide a solution to these challenges.

Today, however, setting up data collaboratives takes too much time and is prone to multiple delays, hindering our ability to understand and respond swiftly and effectively to urgent global crises. The readiness of data collaboratives during crises faces key obstacles in terms of data use agreements, technical infrastructure, vetted and reproducible methodologies, and a clear understanding of the questions which may be answered more effectively with additional data.

Organizations aiming to create data collaboratives often face additional challenges, as they often lack established operational protocols and practices which can streamline implementation, reduce costs, and save time. New regulations are emerging that should help drive the adoption of standard protocols and processes. In particular, the EU Data Governance Act and the forthcoming Data Act aim to enable responsible data collaboration. Concepts like data spaces and rulebooks seek to build trust and strike a balance between regulation and technological innovation.

This working paper advances the case for creating a Mutual Commitment Framework (MCF) in advance of a crisis that can serve as a necessary and practical means to break through chronic choke points and shorten response times. By accelerating the establishment of operational (and legally cognizable) data collaboratives, duties of care can be defined and a stronger sense of trust, clarity, and purpose can be instilled among participating entities. This structured approach ensures that data sharing and processing are conducted within well-defined, pre-authorized boundaries, thereby lowering shared risks and promoting a conducive environment for collaboration…(More)”.

Private UK health data donated for medical research shared with insurance companies


Article by Shanti Das: “Sensitive health information donated for medical research by half a million UK citizens has been shared with insurance companies despite a pledge that it would not be.

An Observer investigation has found that UK Biobank opened up its vast biomedical database to insurance sector firms several times between 2020 and 2023. The data was provided to insurance consultancy and tech firms for projects to create digital tools that help insurers predict a person’s risk of getting a chronic disease. The findings have raised concerns among geneticists, data privacy experts and campaigners over vetting and ethical checks at Biobank.

Set up in 2006 to help researchers investigating diseases, the database contains millions of blood, saliva and urine samples, collected regularly from about 500,000 adult volunteers – along with medical records, scans, wearable device data and lifestyle information.

Approved researchers around the world can pay £3,000 to £9,000 to access records ranging from medical history and lifestyle information to whole genome sequencing data. The resulting research has yielded major medical discoveries and led to Biobank being considered a “jewel in the crown” of British science.

Biobank said it strictly guarded access to its data, only allowing access by bona fide researchers for health-related projects in the public interest. It said this included researchers of all stripes, whether employed by academic, charitable or commercial organisations – including insurance companies – and that “information about data sharing was clearly set out to participants at the point of recruitment and the initial assessment”.

But evidence gathered by the Observer suggests Biobank did not explicitly tell participants it would share data with insurance companies – and made several public commitments not to do so.

When the project was announced, in 2002, Biobank promised that data would not be given to insurance companies after concerns were raised that it could be used in a discriminatory way, such as by the exclusion of people with a particular genetic makeup from insurance.

In an FAQ section on the Biobank website, participants were told: “Insurance companies will not be allowed access to any individual results nor will they be allowed access to anonymised data.” The statement remained online until February 2006, during which time the Biobank project was subject to public scrutiny and discussed in parliament.

The promise was also reiterated in several public statements by backers of Biobank, who said safeguards would be built in to ensure that “no insurance company or police force or employer will have access”.

This weekend, Biobank said the pledge – made repeatedly over four years – no longer applied. It said the commitment had been made before recruitment formally began in 2007 and that when Biobank volunteers enrolled they were given revised information.

This included leaflets and consent forms that contained a provision that anonymised Biobank data could be shared with private firms for “health-related” research, but did not explicitly mention insurance firms or correct the previous assurances…(More)”

A standardised differential privacy framework for epidemiological modeling with mobile phone data


Paper by Merveille Koissi Savi et al: “During the COVID-19 pandemic, the use of mobile phone data for monitoring human mobility patterns has become increasingly common, both to study the impact of travel restrictions on population movement and epidemiological modeling. Despite the importance of these data, the use of location information to guide public policy can raise issues of privacy and ethical use. Studies have shown that simple aggregation does not protect the privacy of an individual, and there are no universal standards for aggregation that guarantee anonymity. Newer methods, such as differential privacy, can provide statistically verifiable protection against identifiability but have been largely untested as inputs for compartment models used in infectious disease epidemiology. Our study examines the application of differential privacy as an anonymisation tool in epidemiological models, studying the impact of adding quantifiable statistical noise to mobile phone-based location data on the bias of ten common epidemiological metrics. We find that many epidemiological metrics are preserved and remain close to their non-private values when the true noise state is less than 20, in a count transition matrix, which corresponds to a privacy-less parameter ϵ = 0.05 per release. We show that differential privacy offers a robust approach to preserving individual privacy in mobility data while providing useful population-level insights for public health. Importantly, we have built a modular software pipeline to facilitate the replication and expansion of our framework…(More)”.