Data to the rescue: how humanitarian aid NGOs should collect information based on the GDPR


Paper by Theodora Gazi: “Data collection is valuable before, during and after interventions in order to increase the effectiveness of humanitarian projects. Although the General Data Protection Regulation (GDPR) sets forth rules for the processing of personal data, its implementation by humanitarian aid actors is crucial and presents challenges. Failure to comply triggers severe risks for both data subjects and the reputation of the actor. This article provides insights into the implementation of the guiding principles of the GDPR, the legal bases for data processing, data subjects’ rights and data sharing during the provision of humanitarian assistance…(More)”

The economics of Business to Government data sharing


Paper by Bertin Martens and Nestor Duch Brown: “Data and information are fundamental pieces for effective evidence-based policy making and provision of public services. In recent years, some private firms have been collecting large amounts of data, which, were they available to governments, could greatly improve their capacity to take better policy decisions and to increase social welfare. Business-to-Government (B2G) data sharing can result in substantial benefits for society. It can save costs to governments by allowing them to benefit from the use of data collected by businesses without having to collect the same data again. Moreover, it can support the production of new and innovative outputs based on the shared data by different users. Finally, the data available to government may give only an incomplete or even biased picture, while aggregating complementary datasets shared by different parties (including businesses) may result in improved policies with strong social welfare benefits.


The examples assembled by the High Level Expert Group on B2G data sharing show that most of the current B2G data transactions remain one-off experimental pilot projects that do not seem to be sustainable over time. Overall, the volume of B2G operations still seems to be relatively small and clearly sub-optimal from a social welfare perspective. The market does not seem to scale compared to the economic potential for welfare gains in society. There are likely to be significant potential economic benefits from additional B2G data sharing operations. These could be enabled by measures that would seek to improve their governance conditions to contribute to increase the overall number of transactions. To design such measures, it is important to understand the nature of the current barriers for B2G data sharing operations. In this paper, we focus on the more important barriers from an economic perspective: (a) monopolistic data markets, (b) high transaction costs and perceived risks in data sharing and (c) a lack of incentives for private firms to contribute to the production of public benefits. The following reflections are mainly conceptual, since there is currently little quantitative empirical evidence on the different aspects of B2G transactions.

  • Monopolistic data markets. Some firms -like big tech companies for instance- may be in a privileged position as the exclusive providers of the type of data that a public body seeks to access. This position enables the firms to charge a high price for the data beyond a reasonable rate of return on costs. While a monopolistic market is still a functioning market, the resulting price may lead to some governments not being able or willing to purchase the data and therefore may cause social welfare losses. Nonetheless, monopolistic pricing may still be justified from an innovation perspective: it strengthens incentives to invest in more and better data collection systems and thereby increases the supply of data in the long run. In some cases, the data seller may be in a position to price-discriminate between commercial buyers and a public body, charging a lower price to the latter since the data would not be used for commercial purposes.
  • High transaction costs and perceived risks. An important barrier for data sharing comes from the ex-ante costs related to finding a suitable data sharing partner, negotiating a contractual arrangement, re-formatting and cleaning the data, among others. Potentially interested public bodies may not be aware of available datasets or may not be in a position to handle them or understand their advantages and disadvantages. There may also be ex-post risks related to uncertainties in the quality and/or usefulness of the data, the technical implementation of the data sharing deal, ensuring compliance with the agreed conditions, the risk of data leaks to unauthorized third-parties and exposure of personal and confidential data.
  • Lack of incentives. Firms may be reluctant to share data with governments because it might have a negative impact on them. This could be due to suspicions that the data delivered might be used to implement market regulations and to enforce competition rules that could negatively affect firms’ profits. Moreover, if firms share data with government under preferential conditions, they may have difficulties justifying the foregone profit to shareholders, since the benefits generated by better policies or public services fuelled by the private data will occur to society as a whole and are often difficult to express in monetary terms. Finally, firms might be afraid of entering into a competitive disadvantage if they provide data to public bodies – perhaps under preferential conditions – and their competitors do not.

Several mechanisms could be designed to solve the barriers that may be holding back B2G data sharing initiatives. One would be to provide stronger incentives for the data supplier firm to engage in this type of transactions. These incentives can be direct, i.e., monetary, or indirect, i.e., reputational (e.g. as part of corporate social responsibility programmes). Another way would be to ascertain the data transfer by making the transaction mandatory, with a fair cost compensation. An intermediate way would be based on solutions that seek to facilitate voluntary B2G operations without mandating them, for example by reducing the transaction costs and perceived risks for the provider data supplier, e.g. by setting up trusted data intermediary platforms, or appropriate contractual provisions. A possible EU governance framework for B2G data sharing operations could cover these options….(More)”.

Computational social science: Obstacles and opportunities


Paper by David M. J. Lazer et al: “The field of computational social science (CSS) has exploded in prominence over the past decade, with thousands of papers published using observational data, experimental designs, and large-scale simulations that were once unfeasible or unavailable to researchers. These studies have greatly improved our understanding of important phenomena, ranging from social inequality to the spread of infectious diseases. The institutions supporting CSS in the academy have also grown substantially, as evidenced by the proliferation of conferences, workshops, and summer schools across the globe, across disciplines, and across sources of data. But the field has also fallen short in important ways. Many institutional structures around the field—including research ethics, pedagogy, and data infrastructure—are still nascent. We suggest opportunities to address these issues, especially in improving the alignment between the organization of the 20th-century university and the intellectual requirements of the field….(More)”.

The Razor’s Edge: Liberalizing the Digital Surveillance Ecosystem


Report by CNAS: “The COVID-19 pandemic is accelerating global trends in digital surveillance. Public health imperatives, combined with opportunism by autocratic regimes and authoritarian-leaning leaders, are expanding personal data collection and surveillance. This tendency toward increased surveillance is taking shape differently in repressive regimes, open societies, and the nation-states in between.

China, run by the Chinese Communist Party (CCP), is leading the world in using technology to enforce social control, monitor populations, and influence behavior. Part of maximizing this control depends on data aggregation and a growing capacity to link the digital and physical world in real time, where online offenses result in brisk repercussions. Further, China is increasing investments in surveillance technology and attempting to influence the patterns of technology’s global use through the export of authoritarian norms, values, and governance practices. For example, China champions its own technology standards to the rest of the world, while simultaneously peddling legislative models abroad that facilitate access to personal data by the state. Today, the COVID-19 pandemic offers China and other authoritarian nations the opportunity to test and expand their existing surveillance powers internally, as well as make these more extensive measures permanent.

Global swing states are already exhibiting troubling trends in their use of digital surveillance, including establishing centralized, government-held databases and trading surveillance practices with authoritarian regimes. Amid the pandemic, swing states like India seem to be taking cues from autocratic regimes by mandating the download of government-enabled contact-tracing applications. Yet, for now, these swing states appear responsive to their citizenry and sensitive to public agitation over privacy concerns.

Today, the COVID-19 pandemic offers China and other authoritarian nations the opportunity to test and expand their existing surveillance powers internally, as well as make these more extensive measures permanent.

Open societies and democracies can demonstrate global surveillance trends similar to authoritarian regimes and swing states, including the expansion of digital surveillance in the name of public safety and growing private sector capabilities to collect and analyze data on individuals. Yet these trends toward greater surveillance still occur within the context of pluralistic, open societies that feature ongoing debates about the limits of surveillance. However, the pandemic stands to shift the debate in these countries from skepticism over personal data collection to wider acceptance. Thus far, the spectrum of responses to public surveillance reflects the diversity of democracies’ citizenry and processes….(More)”.

Bringing Structure and Design to Data Governance


Report by John Wilbanks et al: “Before COVID-19 took over the world, the Governance team at Sage Bionetworks had started working on an analysis of data governance structures and systems to be published as a “green paper” in late 2020. Today we’re happy to publicly release that paper, Mechanisms to Govern Responsible Sharing of Open Data: A Progress Report.

In the paper, we provide a landscape analysis of models of governance for open data sharing based on our observations in the biomedical sciences. We offer an overview of those observations and show areas where we think this work can expand to supply further support for open data sharing outside the sciences.

The central argument of this paper is that the “right” system of governance is determined by first understanding the nature of the collaborative activities intended. These activities map to types of governance structures, which in turn can be built out of standardized parts — what we call governance design patterns. In this way, governance for data science can be easy to build, follow key laws and ethics regimes, and enable innovative models of collaboration. We provide an initial survey of structures and design patterns, as well as examples of how we leverage this approach to rapidly build out ethics-centered governance in biomedical research.

While there is no one-size-fits-all solution, we argue for learning from ongoing data science collaborations and building on from existing standards and tools. And in so doing, we argue for data governance as a discipline worthy of expertise, attention, standards, and innovation.

We chose to call this report a “green paper” in recognition of its maturity and coverage: it’s a snapshot of our data governance ecosystem in biomedical research, not the world of all data governance, and the entire field of data governance is in its infancy. We have licensed the paper under CC-BY 4.0 and published it in github via Manubot in hopes that the broader data governance community might fill in holes we left, correct mistakes we made, add references and toolkits and reference implementations, and generally treat this as a framework for talking about how we share data…(More)”.

The open source movement takes on climate data


Article by Heather Clancy: “…many companies are moving to disclose “climate risk,” although far fewer are moving to actually minimize it. And as those tasked with preparing those reports can attest, the process of gathering the data for them is frustrating and complex, especially as the level of detail desired and required by investors becomes deeper.

That pain point was the inspiration for a new climate data project launched this week that will be spearheaded by the Linux Foundation, the nonprofit host organization for thousands of the most influential open source software and data initiatives in the world such as GitHub. The foundation is central to the evolution of the Linux software that runs in the back offices of most major financial services firms. 

There are four powerful founding members for the new group, the LF Climate Finance Foundation (LFCF): Insurance and asset management company Allianz, cloud software giants Amazon and Microsoft, and data intelligence powerhouse S&P Global. The foundation’s “planning team” includes World Wide Fund for Nature (WWF), Ceres and the Sustainability Account Standards Board (SASB).

The group’s intention is to collaborate on an open source project called the OS-Climate platform, which will include economic and physical risk scenarios that investors, regulators, companies, financial analysts and others can use for their analysis. 

The idea is to create a “public service utility” where certain types of climate data can be accessed easily, then combined with other, more proprietary information that someone might be using for risk analysis, according to Truman Semans, CEO of OS-Climate, who was instrumental in getting the effort off the ground. “There are a whole lot of initiatives out there that address pieces of the puzzle, but no unified platform to allow those to interoperate,” he told me. There are a whole lot of initiatives out there that address pieces of the puzzle, but no unified platform to allow those to interoperate.

Why does this matter? It helps to understand the history of open source software, which was once a thing that many powerful software companies, notably Microsoft, abhorred because they were worried about the financial hit on their intellectual property. Flash forward to today and the open source software movement, “staffed” by literally millions of software developers, is credited with accelerating the creation of common system-level elements so that companies can focus their own resources on solving problems directly related to their business.

In short, this budding effort could make the right data available more quickly, so that businesses — particularly financial institutions — can make better informed decisions.

Or, as Microsoft’s chief intellectual property counsel, Jennifer Yokoyama, observed in the announcement press release: “Addressing climate issues in a meaningful way requires people and organizations to have access to data to better understand the impact of their actions. Opening up and sharing our contribution of significant and relevant sustainability data through the LF Climate Finance Foundation will help advance the financial modeling and understanding of climate change impact — an important step in affecting political change. We’re excited to collaborate with the other founding members and hope additional organizations will join.”…(More)”

UK’s National Data Strategy


DCMS (UK): “…With the increasing ascendance of data, it has become ever-more important that the government removes the unnecessary barriers that prevent businesses and organisations from accessing such information.

The importance of data sharing was demonstrated during the first few months of the coronavirus pandemic, when government departments, local authorities, charities and the private sector came together to provide essential services. One notable example is the Vulnerable Person Service, which in a very short space of time enabled secure data-sharing across the public and private sectors to provide millions of food deliveries and access to priority supermarket delivery slots for clinically extremely vulnerable people.

Aggregation of data from different sources can also lead to new insights that otherwise would not have been possible. For example, the Connected Health Cities project anonymises and links data from different health and social care services, providing new insights into the way services are used.

Vitally, data sharing can also fuel growth and innovation.20 For new and innovating organisations, increasing data availability will mean that they, too, will be able to gain better insights from their work and access new markets – from charities able to pool beneficiary data to better evaluate the effectiveness of interventions, to new entrants able to access new markets. Often this happens as part of commercial arrangements; in other instances government has sought to intervene where there are clear consumer benefits, such as in relation to Open Banking and Smart Data. Government has also invested in the research and development of new mechanisms for better data sharing, such as the Office for AI and Innovate UK’s partnership with the Open Data Institute to explore data trusts.21

However, our call for evidence, along with engagement with stakeholders, has identified a range of barriers to data availability, including:

  • a culture of risk aversion
  • issues with current licensing regulations
  • market barriers to greater re-use, including data hoarding and differential market power
  • inconsistent formatting of public sector data
  • issues pertaining to the discoverability of data
  • privacy and security concerns
  • the benefits relating to increased data sharing not always being felt by the organisation incurring the cost of collection and maintenance

This is a complex environment, and heavy-handed intervention may have the unwanted effect of reducing incentives to collect, maintain and share data for the benefit of the UK. It is clear that any way forward must be carefully considered to avoid unintended negative consequences. There is a balance to be struck between maintaining appropriate commercial incentives to collect data, while ensuring that data can be used widely for the benefit of the UK. For personal data, we must also take account of the balance between individual rights and public benefit.

This is a new issue for all digital economies that has come to the fore as data has become a significant modern, economic asset. Our approach will take account of those incentives, and consider how innovation can overcome perceived barriers to availability. For example, it can be limited to users with specific characteristics, by licence or regulator accreditation; it can be shared within a collaborating group of organisations; there may also be value in creating and sharing synthetic data to support research and innovation, as well as other privacy-enhancing technologies and techniques….(More)”.

The Pandemic Is No Excuse to Surveil Students


 Zeynep Tufekci in the Atlantic: “In Michigan, a small liberal-arts college is requiring students to install an app called Aura, which tracks their location in real time, before they come to campus. Oakland University, also in Michigan, announced a mandatory wearable that would track symptoms, but, facing a student-led petition, then said it would be optional. The University of Missouri, too, has an app that tracks when students enter and exit classrooms. This practice is spreading: In an attempt to open during the pandemic, many universities and colleges around the country are forcing students to download location-tracking apps, sometimes as a condition of enrollment. Many of these apps function via Bluetooth sensors or Wi-Fi networks. When students enter a classroom, their phone informs a sensor that’s been installed in the room, or the app checks the Wi-Fi networks nearby to determine the phone’s location.

As a university professor, I’ve seen surveillance like this before. Many of these apps replicate the tracking system sometimes installed on the phones of student athletes, for whom it is often mandatory. That system tells us a lot about what we can expect with these apps.

There is a widespread charade in the United States that university athletes, especially those who play high-profile sports such as football and basketball, are just students who happen to be playing sports as amateurs “in their free time.” The reality is that these college athletes in high-level sports, who are aggressively recruited by schools, bring prestige and financial resources to universities, under a regime that requires them to train like professional athletes despite their lack of salary. However, making the most of one’s college education and training at that level are virtually incompatible, simply because the day is 24 hours long and the body, even that of a young, healthy athlete, can only take so much when training so hard. Worse, many of these athletes are minority students, specifically Black men, who were underserved during their whole K–12 education and faced the same challenge then as they do now: Train hard in hopes of a scholarship and try to study with what little time is left, often despite being enrolled in schools with mediocre resources. Many of them arrive at college with an athletic scholarship but not enough academic preparation compared with their peers who went to better schools and could also concentrate on schooling….(More)”

Prioritizing COVID-19 tests based on participatory surveillance and spatial scanning


Paper by O.B Leal-Neto et al: “Participatory surveillance has shown promising results from its conception to its application in several public health events. The use of a collaborative information pathway provides a rapid way for the data collection on symptomatic individuals in the territory, to complement traditional health surveillance systems. In Brazil, this methodology has been used at the national level since 2014 during mass gatherings events since they have great importance for monitoring public health emergencies.

With the occurrence of the COVID-19 pandemic, and the limitation of the main non-pharmaceutical interventions for epidemic control – in this case, testing and social isolation – added to the challenge of existing underreporting of cases and delay of notifications, there is a demand on alternative sources of up to date information to complement the current system for disease surveillance. Several studies have demonstrated the benefits of participatory surveillance in coping with COVID-19, reinforcing the opportunity to modernize the way health surveillance has been carried out. Additionally, spatial scanning techniques have been used to understand syndromic scenarios, investigate outbreaks, and analyze epidemiological risk, constituting relevant tools for health management. While there are limitations in the quality of traditional health systems, the data generated by participatory surveillance reveals an interesting application combining traditional techniques to clarify epidemiological risks that demand urgency in decision-making. Moreover, with the limitations of testing available, the identification of priority areas for intervention is an important activity in the early response to public health emergencies. This study aimed to describe and analyze priority areas for COVID-19 testing combining data from participatory surveillance and traditional surveillance for respiratory syndromes….(More)”.

How Competition Impacts Data Privacy


Paper by Aline Blankertz: “A small number of large digital platforms increasingly shape the space for most online interactions around the globe and they often act with hardly any constraint from competing services. The lack of competition puts those platforms in a powerful position that may allow them to exploit consumers and offer them limited choice. Privacy is increasingly considered one area in which the lack of competition may create harm. Because of these concerns, governments and other institutions are developing proposals to expand the scope for competition authorities to intervene to limit the power of the large platforms and to revive competition.  


The first case that has explicitly addressed anticompetitive harm to privacy is the German Bundeskartellamt’s case against Facebook in which the authority argues that imposing bad privacy terms can amount to an abuse of dominance. Since that case started in 2016, more cases deal with the link between competition and privacy. For example, the proposed Google/Fitbit merger has raised concerns about sensitive health data being merged with existing Google profiles and Apple is under scrutiny for not sharing certain personal data while using it for its own services.

However, addressing bad privacy outcomes through competition policy is effective only if those outcomes are caused, at least partly, by a lack of competition. Six distinct mechanisms can be distinguished through which competition may affect privacy, as summarized in Table 1. These mechanisms constitute different hypotheses through which less competition may influence privacy outcomes and lead either to worse privacy in different ways (mechanisms 1-5) or even better privacy (mechanism 6). The table also summarizes the available evidence on whether and to what extent the hypothesized effects are present in actual markets….(More)”.