Align or fail: How economics shape successful data sharing


Blog by Federico Bartolomucci: “…The conceptual distinctions between different data sharing models are mostly based on one fundamental element: the economic nature of data and its value. 

Open data projects operate under the assumption that data is a non-rival (i.e. can be used by multiple people at the same time) and a non-excludable asset (i.e. anyone can use it, similar to a public good like roads or the air we breathe). This means that data can be shared with everyone, for any use, without losing its market and competitive value. The Humanitarian Data Exchange platform is a great example that allows organizations to share over 19,000 open data sets on all aspects of humanitarian response with others.

Data collaboratives treat data as an excludable asset that some people may be excluded from accessing (i.e. a ‘club good’, like a movie theater) and therefore share it only among a restricted pool of actors. At the same time, they overcome the rival nature of this data set up by linking its use to a specific purpose. These work best by giving the actors a voice in choosing the purpose for which the data will be used, and through specific agreements and governance bodies that ensure that those contributing data will not have their competitive position harmed, therefore incentivizing them to engage. A good example of this is the California Data Collaborative, which uses data from different actors in the water sector to develop high-level analysis on water distribution to guide policy, planning, and operations for water districts in the state of California. 

Data ecosystems work by activating market mechanisms around data exchange to overcome reluctance to share data, rather than relying solely on its purpose of use. This means that actors can choose to share their data in exchange for compensation, be it monetary or in alternate forms such as other data. In this way, the compensation balances the potential loss of competitive advantage created by the sharing of a rival asset, as well as the costs and risks of sharing. The Enershare initiative aims to establish a marketplace utilizing blockchain and smart contracts to facilitate data exchange in the energy sector. The platform is based on a compensation system, which can be non-monetary, for exchanging assets and resources related to data (such as datasets, algorithms, and models) with energy assets and services (like heating system maintenance or the transfer of surplus locally self-produced energy).

These different models of data sharing have different operational implications…(More)”.

On Fables and Nuanced Charts


Column by Spencer Greenberg and Amber Dawn Ace: “In 1994, the U.S. Congress passed the largest crime bill in U.S. history, called the Violent Crime Control and Law Enforcement Act. The bill allocated billions of dollars to build more prisons and hire 100,000 new police officers, among other things. In the years following the bill’s passage, violent crime rates in the U.S. dropped drastically, from around 750 offenses per 100,000 people in 1990 to under 400 in 2018.

A chart showing U.S. crime rates over time. The data and annotation are real, but the implied story is not. Credit: Authors.

But can we infer, as this chart seems to ask us to, that the bill caused the drop in crime?

As it turns out, this chart wasn’t put together by sociologists or political scientists who’ve studied violent crime. Rather, we—a mathematician and a writer—devised it to make a point: Although charts seem to reflect reality, they often convey narratives that are misleading or entirely false.

Upon seeing that violent crime dipped after 1990, we looked up major events that happened right around that time—selecting one, the 1994 Crime Bill, and slapping it on the graph. There are other events we could have stuck on the graph just as easily that would likely have invited you to construct a completely different causal story. In other words, the bill and the data in the graph are real, but the story is manufactured.

Perhaps the 1994 Crime Bill really did cause the drop in violent crime, or perhaps the causality goes the other way: the spike in violent crime motivated politicians to pass the act in the first place. (Note that the act was passed slightly after the violent crime rate peaked!) 

Charts are a concise way not only to show data but also to tell a story. Such stories, however, reflect the interpretations of a chart’s creators and are often accepted by the viewer without skepticism. As Noah Smith and many others have argued, charts contain hidden assumptions that can drastically change the story they tell…(More)”.

Toward a citizen science framework for public policy evaluation


Paper by Giovanni Esposito et al: “This study pioneers the use of citizen science in evaluating Freedom of Information laws, with a focus on Belgium, where since its 1994 enactment, Freedom of Information’s effectiveness has remained largely unexamined. Utilizing participatory methods, it engages citizens in assessing transparency policies, significantly contributing to public policy evaluation methodology. The research identifies regional differences in Freedom of Information implementation across Belgian municipalities, highlighting that larger municipalities handle requests more effectively, while administrations generally show reluctance to respond to requests from perceived knowledgeable individuals. This phenomenon reflects a broader European caution toward well-informed requesters. By integrating citizen science, this study not only advances our understanding of Freedom of Information law effectiveness in Belgium but also advocates for a more inclusive, collaborative approach to policy evaluation. It addresses the gap in researchers’ experience with citizen science, showcasing its vast potential to enhance participatory governance and policy evaluation…(More)”.

Revisiting the ‘Research Parasite’ Debate in the Age of AI


Article by C. Brandon Ogbunu: “A 2016 editorial published in the New England Journal of Medicine lamented the existence of “research parasites,” those who pick over the data of others rather than generating new data themselves. The article touched on the ethics and appropriateness of this practice. The most charitable interpretation of the argument centered around the hard work and effort that goes into the generation of new data, which costs millions of research dollars and takes countless person-hours. Whatever the merits of that argument, the editorial and its associated arguments were widely criticized.

Given recent advances in AI, revisiting the research parasite debate offers a new perspective on the ethics of sharing and data democracy. It is ironic that the critics of research parasites might have made a sound argument — but for the wrong setting, aimed at the wrong target, at the wrong time. Specifically, the large language models, or LLMs, that underlie generative AI tools such as OpenAI’s ChatGPT, have an ethical challenge in how they parasitize freely available data. These discussions bring up new conversations about data security that may undermine, or at least complicate, efforts at openness and data democratization.

The backlash to that 2016 editorial was swift and violent. Many arguments centered around the anti-science spirit of the message. For example, metanalysis – which re-analyzes data from a selection of studies – is a critical practice that should be encouraged. Many groundbreaking discoveries about the natural world and human health have come from this practice, including new pictures of the molecular causes of depression and schizophrenia. Further, the central criticisms of research parasitism undermine the ethical goals of data sharing and ambitions for open science, where scientists and citizen-scientists can benefit from access to data. This differs from the status quo in 2016, when data published in many of the top journals of the world were locked behind a paywall, illegible, poorly labeled, or difficult to use. This remains largely true in 2024…(More)”.

Private sector trust in data sharing: enablers in the European Union


Paper by Jaime Bernal: “Enabling private sector trust stands as a critical policy challenge for the success of the EU Data Governance Act and Data Act in promoting data sharing to address societal challenges. This paper attributes the widespread trust deficit to the unmanageable uncertainty that arises from businesses’ limited usage control to protect their interests in the face of unacceptable perceived risks. For example, a firm may hesitate to share its data with others in case it is leaked and falls into the hands of business competitors. To illustrate this impasse, competition, privacy, and reputational risks are introduced, respectively, in the context of three suboptimal approaches to data sharing: data marketplaces, data collaboratives, and data philanthropy. The paper proceeds by analyzing seven trust-enabling mechanisms comprised of technological, legal, and organizational elements to balance trust, risk, and control and assessing their capacity to operate in a fair, equitable, and transparent manner. Finally, the paper examines the regulatory context in the EU and the advantages and limitations of voluntary and mandatory data sharing, concluding that an approach that effectively balances the two should be pursued…(More)”.

The Art of Uncertainty


Book by David Spiegelhalter: “We live in a world where uncertainty is inevitable. How should we deal with what we don’t know? And what role do chance, luck and coincidence play in our lives?

David Spiegelhalter has spent his career dissecting data in order to understand risks and assess the chances of what might happen in the future. In The Art of Uncertainty, he gives readers a window onto how we can all do this better.

In engaging, crystal-clear prose, he takes us through the principles of probability, showing how it can help us think more analytically about everything from medical advice to pandemics and climate change forecasts, and explores how we can update our beliefs about the future in the face of constantly changing experience. Along the way, he explains why roughly 40% of football results come down to luck rather than talent, how the National Risk Register assesses near-term risks to the United Kingdom, and why we can be so confident that two properly shuffled packs of cards have never, ever been in the exact same order.

Drawing on a wide range of captivating real-world examples, this is an essential guide to navigating uncertainty while also having the humility to admit what we do not know…(More)”.

Collaboration in Healthcare: Implications of Data Sharing for Secondary Use in the European Union


Paper by Fanni Kertesz: “The European healthcare sector is transforming toward patient-centred and value-based healthcare delivery. The European Health Data Space (EHDS) Regulation aims to unlock the potential of health data by establishing a single market for its primary and secondary use. This paper examines the legal challenges associated with the secondary use of health data within the EHDS and offers recommendations for improvement. Key issues include the compatibility between the EHDS and the General Data Protection Regulation (GDPR), barriers to cross-border data sharing, and intellectual property concerns. Resolving these challenges is essential for realising the full potential of health data and advancing healthcare research and innovation within the EU…(More)”.

Definitions, digital, and distance: on AI and policymaking


Article by Gavin Freeguard: “Our first question is less, ‘to what extent can AI improve public policymaking?’, but ‘what is currently wrong with policymaking?’, and then, ‘is AI able to help?’.

Ask those in and around policymaking about the problems and you’ll get a list likely to include:

  • the practice not having changed in decades (or centuries)
  • it being an opaque ‘dark art’ with little transparency
  • defaulting to easily accessible stakeholders and evidence
  • a separation between policy and delivery (and digital and other disciplines), and failure to recognise the need for agility and feedback as opposed to distinct stages
  • the challenges in measuring or evaluating the impact of policy interventions and understanding what works, with a lack of awareness, let alone sharing, of case studies elsewhere
  • difficulties in sharing data
  • the siloed nature of government complicating cross-departmental working
  • policy asks often being dictated by politics, with electoral cycles leading to short-termism, ministerial churn changing priorities and personal style, events prompting rushed reactions, or political priorities dictating ‘policy-based evidence making’
  • a rush to answers before understanding the problem
  • definitional issues about what policy actually is making it hard to get a hold of or develop professional expertise.  

If we’re defining ‘policy’ and the problem, we also need to define ‘AI’, or at least acknowledge that we are not only talking about new, shiny generative AI, but a world of other techniques for automating processes and analysing data that have been used in government for years.

So is ‘AI’ able to help? It could support us to make better use of a wider range of data more quickly; but it could privilege that which is easier to measure, strip data of vital context, and embed biases and historical assumptions. It could ‘make decisions more transparent (perhaps through capturing digital records of the process behind them, or by visualising the data that underpins a decision)’; or make them more opaque with ‘black-box’ algorithms, and distract from overcoming the very human cultural problems around greater openness. It could help synthesise submissions or generate ideas to brainstorm; or fail to compensate for deficiencies in underlying government knowledge infrastructure, and generate gibberish. It could be a tempting silver bullet for better policy; or it could paper over the cracks, while underlying technical, organisational and cultural plumbing goes unfixed. It could have real value in some areas, or cause harms in others…(More)”.

Geographies of missing data: Spatializing counterdata production against feminicide


Paper by Catherine D’Ignazio et al: “Feminicide is the gender-related killing of cisgender and transgender women and girls. It reflects patriarchal and racialized systems of oppression and reveals how territories and socio-economic landscapes configure everyday gender-related violence. In recent decades, many grassroots data production initiatives have emerged with the aim of monitoring this extreme but invisibilized phenomenon. We bridge scholarship in feminist and information geographies with data feminism to examine the ways in which space, broadly defined, shapes the counterdata production strategies of feminicide data activists. Drawing on a qualitative study of 33 monitoring efforts led by civil society organizations across 15 countries, primarily in Latin America, we provide a conceptual framework for examining the spatial dimensions of data activism. We show how there are striking transnational patterns related to where feminicide goes unrecorded, resulting in geographies of missing data. In response to these omissions, activists deploy multiple spatialized strategies to make these geographies visible, to situate and contextualize each case of feminicide, to reclaim databases as spaces for memory and witnessing, and to build transnational networks of solidarity. In this sense, we argue that data activism about feminicide constitutes a space of resistance and resignification of everyday forms of gender-related violence…(More)”.

On Slicks and Satellites: An Open Source Guide to Marine Oil Spill Detection


Article by Wim Zwijnenburg: “The sheer scale of ocean oil pollution is staggering. In Europe, a suspected 3,000 major illegal oil dumps take place annually, with an estimated release of between 15,000 and 60,000 tonnes of oil ending up in the North Sea. In the Mediterranean, figures provided by the Regional Marine Pollution Emergency Response Centre estimate there are 1,500 to 2,000 oil spills every year.

The impact of any single oil spill on a marine or coastal ecosystem can be devastating and long-lasting. Animals such as birds, turtles, dolphins and otters can suffer from ingesting or inhaling oil, as well as getting stuck in the slick. The loss of water and soil quality can be toxic to both flora and fauna. Heavy metals enter the food chain, poisoning everything from plankton to shellfish, which in turn affects the livelihoods of coastal communities dependent on fishing and tourism.

However, with a wealth of open source earth observation tools at our fingertips, during such environmental disasters it’s possible for us to identify and monitor these spills, highlight at-risk areas, and even hold perpetrators accountable. …

There are several different types of remote sensing sensors we can use for collecting data about the Earth’s surface. In this article we’ll focus on two: optical and radar sensors. 

Optical imagery captures the broad light spectrum reflected from the Earth, also known as passive remote sensing. In contrast, Synthetic Aperture Radar (SAR) uses active remote sensing, sending radio waves down to the Earth’s surface and capturing them as they are reflected back. Any change in the reflection can indicate a change on ground, which can then be investigated. For more background, see Bellingcat contributor Ollie Ballinger’s Remote Sensing for OSINT Guide…(More)”.