Align or fail: How economics shape successful data sharing


Blog by Federico Bartolomucci: “…The conceptual distinctions between different data sharing models are mostly based on one fundamental element: the economic nature of data and its value. 

Open data projects operate under the assumption that data is a non-rival (i.e. can be used by multiple people at the same time) and a non-excludable asset (i.e. anyone can use it, similar to a public good like roads or the air we breathe). This means that data can be shared with everyone, for any use, without losing its market and competitive value. The Humanitarian Data Exchange platform is a great example that allows organizations to share over 19,000 open data sets on all aspects of humanitarian response with others.

Data collaboratives treat data as an excludable asset that some people may be excluded from accessing (i.e. a ‘club good’, like a movie theater) and therefore share it only among a restricted pool of actors. At the same time, they overcome the rival nature of this data set up by linking its use to a specific purpose. These work best by giving the actors a voice in choosing the purpose for which the data will be used, and through specific agreements and governance bodies that ensure that those contributing data will not have their competitive position harmed, therefore incentivizing them to engage. A good example of this is the California Data Collaborative, which uses data from different actors in the water sector to develop high-level analysis on water distribution to guide policy, planning, and operations for water districts in the state of California. 

Data ecosystems work by activating market mechanisms around data exchange to overcome reluctance to share data, rather than relying solely on its purpose of use. This means that actors can choose to share their data in exchange for compensation, be it monetary or in alternate forms such as other data. In this way, the compensation balances the potential loss of competitive advantage created by the sharing of a rival asset, as well as the costs and risks of sharing. The Enershare initiative aims to establish a marketplace utilizing blockchain and smart contracts to facilitate data exchange in the energy sector. The platform is based on a compensation system, which can be non-monetary, for exchanging assets and resources related to data (such as datasets, algorithms, and models) with energy assets and services (like heating system maintenance or the transfer of surplus locally self-produced energy).

These different models of data sharing have different operational implications…(More)”.

On Fables and Nuanced Charts


Column by Spencer Greenberg and Amber Dawn Ace: “In 1994, the U.S. Congress passed the largest crime bill in U.S. history, called the Violent Crime Control and Law Enforcement Act. The bill allocated billions of dollars to build more prisons and hire 100,000 new police officers, among other things. In the years following the bill’s passage, violent crime rates in the U.S. dropped drastically, from around 750 offenses per 100,000 people in 1990 to under 400 in 2018.

A chart showing U.S. crime rates over time. The data and annotation are real, but the implied story is not. Credit: Authors.

But can we infer, as this chart seems to ask us to, that the bill caused the drop in crime?

As it turns out, this chart wasn’t put together by sociologists or political scientists who’ve studied violent crime. Rather, we—a mathematician and a writer—devised it to make a point: Although charts seem to reflect reality, they often convey narratives that are misleading or entirely false.

Upon seeing that violent crime dipped after 1990, we looked up major events that happened right around that time—selecting one, the 1994 Crime Bill, and slapping it on the graph. There are other events we could have stuck on the graph just as easily that would likely have invited you to construct a completely different causal story. In other words, the bill and the data in the graph are real, but the story is manufactured.

Perhaps the 1994 Crime Bill really did cause the drop in violent crime, or perhaps the causality goes the other way: the spike in violent crime motivated politicians to pass the act in the first place. (Note that the act was passed slightly after the violent crime rate peaked!) 

Charts are a concise way not only to show data but also to tell a story. Such stories, however, reflect the interpretations of a chart’s creators and are often accepted by the viewer without skepticism. As Noah Smith and many others have argued, charts contain hidden assumptions that can drastically change the story they tell…(More)”.

On Slicks and Satellites: An Open Source Guide to Marine Oil Spill Detection


Article by Wim Zwijnenburg: “The sheer scale of ocean oil pollution is staggering. In Europe, a suspected 3,000 major illegal oil dumps take place annually, with an estimated release of between 15,000 and 60,000 tonnes of oil ending up in the North Sea. In the Mediterranean, figures provided by the Regional Marine Pollution Emergency Response Centre estimate there are 1,500 to 2,000 oil spills every year.

The impact of any single oil spill on a marine or coastal ecosystem can be devastating and long-lasting. Animals such as birds, turtles, dolphins and otters can suffer from ingesting or inhaling oil, as well as getting stuck in the slick. The loss of water and soil quality can be toxic to both flora and fauna. Heavy metals enter the food chain, poisoning everything from plankton to shellfish, which in turn affects the livelihoods of coastal communities dependent on fishing and tourism.

However, with a wealth of open source earth observation tools at our fingertips, during such environmental disasters it’s possible for us to identify and monitor these spills, highlight at-risk areas, and even hold perpetrators accountable. …

There are several different types of remote sensing sensors we can use for collecting data about the Earth’s surface. In this article we’ll focus on two: optical and radar sensors. 

Optical imagery captures the broad light spectrum reflected from the Earth, also known as passive remote sensing. In contrast, Synthetic Aperture Radar (SAR) uses active remote sensing, sending radio waves down to the Earth’s surface and capturing them as they are reflected back. Any change in the reflection can indicate a change on ground, which can then be investigated. For more background, see Bellingcat contributor Ollie Ballinger’s Remote Sensing for OSINT Guide…(More)”.

Building LLMs for the social sector: Emerging pain points


Blog by Edmund Korley: “…One of the sprint’s main tracks focused on using LLMs to enhance the impact and scale of chat services in the social sector.

Six organizations participated, with operations spanning Africa and India. Bandhu empowers India’s blue-collar workers and migrants by connecting them to jobs and affordable housing, helping them take control of their livelihoods and future stability. Digital Green enhances rural farmers’ agency with AI-driven insights to improve agricultural productivity and livelihoods. Jacaranda Health provides mothers in sub-Saharan Africa with essential information and support to improve maternal and newborn health outcomes. Kabakoo equips youth in Francophone Africa with digital skills, fostering self-reliance and economic independence. Noora Health teaches Indian patients and caregivers critical health skills, enhancing their ability to manage care. Udhyam provides micro-entrepreneurs’ with education, mentorship, and financial support to build sustainable businesses.

These organizations demonstrate diverse ways one can boost human agency: they help people in underserved communities take control of their lives, make more informed choices, and build better futures – and they are piloting AI interventions to scale these efforts…(More)”.

Using internet search data as part of medical research


Blog by Susan Thomas and Matthew Thompson: “…In the UK, almost 50 million health-related searches are made using Google per year. Globally there are 100s of millions of health-related searches every day. And, of course, people are doing these searches in real-time, looking for answers to their concerns in the moment. It’s also possible that, even if people aren’t noticing and searching about changes to their health, their behaviour is changing. Maybe they are searching more at night because they are having difficulty sleeping or maybe they are spending more (or less) time online. Maybe an individual’s search history could actually be really useful for researchers. This realisation has led medical researchers to start to explore whether individuals’ online search activity could help provide those subtle, almost unnoticeable signals that point to the beginning of a serious illness.

Our recent review found 23 studies have been published so far that have done exactly this. These studies suggest that online search activity among people later diagnosed with a variety of conditions ranging from pancreatic cancer and stroke to mood disorders, was different to people who did not have one of these conditions.

One of these studies was published by researchers at Imperial College London, who used online search activity to identify signals of women with gynaecological malignancies. They found that women with malignant (e.g. ovarian cancer) and benign conditions had different search patterns, up to two months prior to a GP referral. 

Pause for a moment, and think about what this could mean. Ovarian cancer is one of the most devastating cancers women get. It’s desperately hard to detect early – and yet there are signals of this cancer visible in women’s internet searches months before diagnosis?…(More)”.

The Imperial Origins of Big Data


Blog and book by Asheesh Kapur Siddique: “We live in a moment of massive transformation in the nature of information. In 2020, according to one report, users of the Internet created 64.2 zetabytes of data, a quantity greater than the “number of detectable stars in the cosmos,” a colossal increase whose origins can be traced to the emergence of the World Wide Web in 1993.1 Facilitated by technologies like satellites, smartphones, and artificial intelligence, the scale and speed of data creation seems like it may only balloon over the rest of our lifetimes—and with it, the problem of how to govern ourselves in relation to the inequalities and opportunities that the explosion of data creates.

But while much about our era of big data is indeed revolutionary, the political questions that it raises—How should information be used? Who should control it? And how should it be preserved?—are ones with which societies have long grappled. These questions attained a particular importance in Europe from the eleventh century due to a technological change no less significant than the ones we are witnessing today: the introduction of paper into Europe. Initially invented in China, paper travelled to Europe via the conduit of Islam around the eleventh century after the Moors conquered Spain. Over the twelfth, thirteenth, and fourteenth centuries, paper emerged as the fundamental substrate which politicians, merchants, and scholars relied on to record and circulate information in governance, commerce, and learning. At the same time, governing institutions sought to preserve and control the spread of written information through the creation of archives: repositories where they collected, organized, and stored documents.

The expansion of European polities overseas from the late fifteenth century onward saw governments massively scale up their use of paper—and confront the challenge of controlling its dissemination across thousands of miles of ocean and land. These pressures were felt particularly acutely in what eventually became the largest empire in world history, the British empire. As people from the British isles from the early seventeenth century fought, traded, and settled their way to power in the Atlantic world and South Asia, administrators faced the problem of how to govern both their emigrating subjects and the non-British peoples with whom they interacted. This meant collecting information about their behavior through the technology of paper. Just as we struggle to organize, search, and control our email boxes, text messages, and app notifications, so too did these early moderns confront the attendant challenges of developing practices of collection and storage to manage the resulting information overload. And despite the best efforts of states and companies to control information, it constantly escaped their grasp, falling into the hands of their opponents and rivals who deployed it to challenge and contest ruling powers.

The history of the early modern information state offers no simple or straightforward answers to the questions that data raises for us today. But it does remind us of a crucial truth, all too readily obscured by the deluge of popular narratives glorifying technological innovation: that questions of data are inherently questions about politics—about who gets to collect, control, and use information, and the ends to which information should be put. We should resist any effort to insulate data governance from democratic processes—and having an informed perspective on the politics of data requires that we attend not just to its present, but also to its past…(More)”.

The Power of Supercitizens


Blog by Brian Klaas: “Lurking among us, there are a group of hidden heroes, people who routinely devote significant amounts of their time, energy, and talent to making our communities better. These are the devoted, do-gooding, elite one percent. Most, but not all, are volunteers.1 All are selfless altruists. They, the supercitizens, provide some of the stickiness in the social glue that holds us together.2

What if I told you that there’s this little trick you can do that makes your community stronger, helps other people, and makes you happier and live longer? Well, it exists, there’s ample evidence it works, and best of all, it’s free.

Recently published research showcases a convincing causal link between these supercitizens—devoted, regular volunteers—and social cohesion. While such an umbrella term means a million different things, these researchers focused on two UK-based surveys that analyzed three facets of social cohesion, measured through eight questions (respondents answered on a five point scale, ranging from strongly disagree to strongly agree). They were:


Neighboring

  • ‘If I needed advice about something I could go to someone in my neighborhood’;
  • ‘I borrow things and exchange favors with my neighbors’; and
  • ‘I regularly stop and talk with people in my neighborhood’

Psychological sense of community

  • ‘I feel like I belong to this neighborhood’;
  • ‘The friendships and associations I have with other people in my neighborhood mean a lot to me’;
  • ‘I would be willing to work together with others on something to improve my neighborhood’; and
  • ‘I think of myself as similar to the people that live in this neighborhood’)

Attraction to the neighborhood

  • ‘I plan to remain a resident of this neighborhood for a number of years’

While these questions only tap into some specific components of social cohesion, high levels of these ingredients are likely to produce a reliable recipe for a healthy local community. (Social cohesion differs from social capital, popularized by Robert Putnam and his book, Bowling Alone. Social capital tends to focus on links between individuals and groups—are you a joiner or more of a loner?—whereas cohesion refers to a more diffuse sense of community, belonging, and neighborliness)…(More)”.

The Power of Volunteers: Remote Mapping Gaza and Strategies in Conflict Areas


Blog by Jessica Pechmann: “…In Gaza, increased conflict since October 2023 has caused a prolonged humanitarian crisis. Understanding the impact of the conflict on buildings has been challenging, since pre-existing datasets from artificial intelligence and machine learning (AI/ML) models and OSM were not accurate enough to create a full building footprint baseline. The area’s buildings were too dense, and information on the ground was impossible to collect safely. In these hard-to-reach areas, HOT’s remote and crowdsourced mapping methodology was a good fit for collecting detailed information visible on aerial imagery.

In February 2024, after consultation with humanitarian and UN actors working in Gaza, HOT decided to create a pre-conflict dataset of all building footprints in the area in OSM. HOT’s community of OpenStreetMap volunteers did all the data work, coordinating through HOT’s Tasking Manager. The volunteers made meticulous edits to add missing data and to improve existing data. Due to protection and data quality concerns, only expert volunteer teams were assigned to map and validate the area. As in other areas that are hard to reach due to conflict, HOT balanced the data needs with responsible data practices based on the context.

Comparing AI/ML with human-verified OSM building datasets in conflict zones

AI/ML is becoming an increasingly common and quick way to obtain building footprints across large areas. Sources for automated building footprints range from worldwide datasets by Microsoft or Google to smaller-scale open community-managed tools such as HOT’s new application, fAIr.

Now that HOT volunteers have completely updated and validated all OSM buildings in visible imagery pre-conflict, OSM has 18% more individual buildings in the Gaza strip than Microsoft’s ML buildings dataset (estimated 330,079 buildings vs 280,112 buildings). However, in contexts where there has not been a coordinated update effort in OSM, the numbers may differ. For example, in Sudan where there has not been a large organized editing campaign, there are just under 1,500,000 in OSM, compared to over 5,820,000 buildings in Microsoft’s ML data. It is important to note that the ML datasets have not been human-verified and their accuracy is not known. Google Open Buildings has over 26 million building features in Sudan, but on visual inspection, many of these features are noise in the data that the model incorrectly identified as buildings in the uninhabited desert…(More)”.

Under which conditions can civic monitoring be admitted as a source of evidence in courts?


Blog by Anna Berti Suman: “The ‘Sensing for Justice’ (SensJus) research project – running between 2020 and 2023 – explored how people use monitoring technologies or just their senses to gather evidence of environmental issues and claim environmental justice in a variety of fora. Among the other research lines, we looked at successful and failed cases of civic-gathered data introduced in courts. The guiding question was: what are the enabling factors and/or barriers for the introduction of civic evidence in environmental litigation?

Civic environmental monitoring is the use by ordinary people of monitoring devices (e.g., a sensor) or their bare senses (e.g., smell, hearing) to detect environmental issues. It can be regarded as a form of reaction to environmental injustices, a form of political contestation through data and even as a form of collective care. The practice is fast growing, especially thanks to the widespread availability of audio and video-recording devices in the hand of diverse publics, but also due to the increase in public literacy and concern on environmental matters.

Civic monitoring can be a powerful source of evidence for law enforcement, especially when it sheds light on official informational gaps associated with the shortages of public agencies’ resources to detect environmental wrongdoings. Both legal scholars and practitioners as well as civil society organizations and institutional actors should look at the practice and its potential applications with attention.

Among the cases explored for the SensJus project, the Formosa case, Texas, United States, stands out as it sets a key precedent: issued in June 2019, the landmark ruling found a Taiwanese petrochemical company liable for violating the US Clean Water Act, mostly on the basis of citizen-collected evidence involving volunteer observations of plastic contamination over years. The contamination could not be proven through existing data held by competent authorities because the company never filed any record of pollution. Our analysis of the case highlights some key determinants of the case’s success…(More)”.

Future-proofing government data


Article by Amy Jones: “Vast amounts of data are fueling innovation and decision-making, and agencies representing the United States government are custodian to some of the largest repositories of data in the world. As one of the world’s largest data creators and consumers, the federal government has made substantial investments in sourcing, curating, and leveraging data across many domains. However, the increasing reliance on artificial intelligence to extract insights and drive efficiencies necessitates a strategic pivot: agencies must evolve data management practices to identify and discriminate synthetic data from organic sources to safeguard the integrity and utility of data assets.

AI’s transformative potential is contingent on the availability of high-quality data. Data readiness includes attention to quality, accuracy, completeness, consistency, timeliness and relevance, at a minimum, and agencies are adopting robust data governance frameworks that enforce data quality standards at every stage of the data lifecycle. This includes implementing advanced data validation techniques, fostering a culture of data stewardship, and leveraging state-of-the-art tools for continuous data quality monitoring…(More)”.