The Limitations of Consent as a Legal Basis for Data Processing in the Digital Society


Paper by the Centre for Information Policy Leadership: “Contemporary everyday life is increasingly permeated by digital information, whether by creating, consuming or depending on it. Most of our professional and private lives now rely to a large degree on digital interactions. As a result, access to and the use of data, and in particular personal data, are key elements and drivers of the digital economy and society. This has brought us to a significant inflection point on the issue of legitimising the processing of personal data in the wide range of contexts that are essential to our data-driven, AI-enabled digital products and services. The time has come to seriously re-consider the status of consent as a privileged legal basis and to consider alternatives that are better suited for a wide range of essential data processing contexts. The most prominent among these alternatives are the “legitimate interest” and “contractual necessity” legal bases, which have found an equivalent in a number of jurisdictions. One example is Singapore, where revisions to their data protection framework include a legitimate interest exemption…(More)”.

Why probability probably doesn’t exist (but it is useful to act like it does)


Article by David Spiegelhalter: “Life is uncertain. None of us know what is going to happen. We know little of what has happened in the past, or is happening now outside our immediate experience. Uncertainty has been called the ‘conscious awareness of ignorance’1 — be it of the weather tomorrow, the next Premier League champions, the climate in 2100 or the identity of our ancient ancestors.

In daily life, we generally express uncertainty in words, saying an event “could”, “might” or “is likely to” happen (or have happened). But uncertain words can be treacherous. When, in 1961, the newly elected US president John F. Kennedy was informed about a CIA-sponsored plan to invade communist Cuba, he commissioned an appraisal from his military top brass. They concluded that the mission had a 30% chance of success — that is, a 70% chance of failure. In the report that reached the president, this was rendered as “a fair chance”. The Bay of Pigs invasion went ahead, and was a fiasco. There are now established scales for converting words of uncertainty into rough numbers. Anyone in the UK intelligence community using the term ‘likely’, for example, should mean a chance of between 55% and 75% (see go.nature.com/3vhu5zc).

Attempts to put numbers on chance and uncertainty take us into the mathematical realm of probability, which today is used confidently in any number of fields. Open any science journal, for example, and you’ll find papers liberally sprinkled with P values, confidence intervals and possibly Bayesian posterior distributions, all of which are dependent on probability.

And yet, any numerical probability, I will argue — whether in a scientific paper, as part of weather forecasts, predicting the outcome of a sports competition or quantifying a health risk — is not an objective property of the world, but a construction based on personal or collective judgements and (often doubtful) assumptions. Furthermore, in most circumstances, it is not even estimating some underlying ‘true’ quantity. Probability, indeed, can only rarely be said to ‘exist’ at all…(More)”.

Humanitarian Mapping with WhatsApp: Introducing ChatMap


Article by Emilio Mariscal: “…After some exploration, I came up with an idea: what if we could export chat conversations and extract the location data along with the associated messages? The solution would involve a straightforward application where users can upload their exported chats and instantly generate a map displaying all shared locations and messages. No business accounts or complex integrations would be required—just a simple, ready-to-use tool from day one.

ChatMap —chatmap.hotosm.org — is a straightforward and simple mapping solution that leverages WhatsApp, an application used by 2.78 billion people worldwide. Its simplicity and accessibility make it an effective tool for communities with limited technical knowledge. And it even works offline! as it relies on the GPS signal for location, sending all data with the phone to gather connectivity.

This solution provides complete independence, as it does not require users to adopt a technology that depends on third-party maintenance. It’s a simple data flow with an equally straightforward script that can be improved by anyone interested on GitHub.

We’re already using it! Recently, as part of a community mapping project to assess the risks in the slopes of Comuna 8 in Medellín, an area vulnerable to repeated flooding, a group of students and local collectives collaborated with the Humanitarian OpenStreetMap (HOT) to map areas affected by landslides and other disaster impacts. This initiative facilitated the identification and characterization of settlements, supporting humanitarian aid efforts.

Humanitarian Mapping ChatMap.jpg
Photo by Daniela Arbeláez Suárez (source: WhatsApp)

As shown in the picture, the community explored the area on foot, using their phones to take photos and notes, and shared them along with the location. It was incredibly simple!

The data gathered during this activity was transformed 20 minutes later (once getting access to a WIFI network) into a map, which was then uploaded to our online platform powered by uMap (umap.hotosm.org)…(More)”.

Humanitarian Mapping ChatMap WhatsApp Colombia.jpg
See more at https://umap.hotosm.org/en/map/unaula-mapea-con-whatsapp_38

Innovating with Non-Traditional Data: Recent Use Cases for Unlocking Public Value


Article by Stefaan Verhulst and Adam Zable: “Non-Traditional Data (NTD): “data that is digitally captured (e.g. mobile phone records), mediated (e.g. social media), or observed (e.g. satellite imagery), using new instrumentation mechanisms, often privately held.”

Digitalization and the resulting datafication have introduced a new category of data that, when re-used responsibly, can complement traditional data in addressing public interest questions—from public health to environmental conservation. Unlocking these often privately held datasets through data collaboratives is a key focus of what we have called The Third Wave of Open Data

To help bridge this gap, we have curated below recent examples of the use of NTD for research and decision-making that were published the past few months. They are organized into five categories:

  • Health and Well-being;
  • Humanitarian Aid;
  • Environment and Climate;
  • Urban Systems and Mobility, and 
  • Economic and Labor Dynamics…(More)”.

It Was the Best of Times, It Was the Worst of Times: The Dual Realities of Data Access in the Age of Generative AI


Article by Stefaan Verhulst: “It was the best of times, it was the worst of times… It was the spring of hope, it was the winter of despair.” –Charles Dickens, A Tale of Two Cities

Charles Dickens’s famous line captures the contradictions of the present moment in the world of data. On the one hand, data has become central to addressing humanity’s most pressing challenges — climate change, healthcare, economic development, public policy, and scientific discovery. On the other hand, despite the unprecedented quantity of data being generated, significant obstacles remain to accessing and reusing it. As our digital ecosystems evolve, including the rapid advances in artificial intelligence, we find ourselves both on the verge of a golden era of open data and at risk of slipping deeper into a restrictive “data winter.”

A Tale of Two Cities by Charles Dickens (1902)

These two realities are concurrent: the challenges posed by growing restrictions on data reuse, and the countervailing potential brought by advancements in privacy-enhancing technologies (PETs), synthetic data, and data commons approaches. It argues that while current trends toward closed data ecosystems threaten innovation, new technologies and frameworks could lead to a “Fourth Wave of Open Data,” potentially ushering in a new era of data accessibility and collaboration…(More)” (First Published in Industry Data for Society Partnership’s (IDSP) 2024 Year in Review).

The AI revolution is running out of data. What can researchers do?


Article by Nicola Jones: “The Internet is a vast ocean of human knowledge, but it isn’t infinite. And artificial intelligence (AI) researchers have nearly sucked it dry.

The past decade of explosive improvement in AI has been driven in large part by making neural networks bigger and training them on ever-more data. This scaling has proved surprisingly effective at making large language models (LLMs) — such as those that power the chatbot ChatGPT — both more capable of replicating conversational language and of developing emergent properties such as reasoning. But some specialists say that we are now approaching the limits of scaling. That’s in part because of the ballooning energy requirements for computing. But it’s also because LLM developers are running out of the conventional data sets used to train their models.

A prominent study1 made headlines this year by putting a number on this problem: researchers at Epoch AI, a virtual research institute, projected that, by around 2028, the typical size of data set used to train an AI model will reach the same size as the total estimated stock of public online text. In other words, AI is likely to run out of training data in about four years’ time (see ‘Running out of data’). At the same time, data owners — such as newspaper publishers — are starting to crack down on how their content can be used, tightening access even more. That’s causing a crisis in the size of the ‘data commons’, says Shayne Longpre, an AI researcher at the Massachusetts Institute of Technology in Cambridge who leads the Data Provenance Initiative, a grass-roots organization that conducts audits of AI data sets.

The imminent bottleneck in training data could be starting to pinch. “I strongly suspect that’s already happening,” says Longpre…(More)”

Running out of data: Chart showing projections of the amount of text data used to train large language models and the amount of available text on the Internet, suggesting that by 2028, developers will be using data sets that match the total amount of text that is available.

My Voice, Your Voice, Our Voice: Attitudes Towards Collective Governance of a Choral AI Dataset


Paper by Jennifer Ding, Eva Jäger, Victoria Ivanova, and Mercedes Bunz: “Data grows in value when joined and combined; likewise the power of voice grows in ensemble. With 15 UK choirs, we explore opportunities for bottom-up data governance of a jointly created Choral AI Dataset. Guided by a survey of chorister attitudes towards generative AI models trained using their data, we explore opportunities to create empowering governance structures that go beyond opt in and opt out. We test the development of novel mechanisms such as a Trusted Data Intermediary (TDI) to enable governance of the dataset amongst the choirs and AI developers. We hope our findings can contribute to growing efforts to advance collective data governance practices and shape a more creative, empowering future for arts communities in the generative AI ecosystem…(More)”.

Can the world’s most successful index get back up the rankings?


Article by James Watson: “You know your ranking model is influential when national governments change policies with the explicit goal of boosting their position on your index. That was the power of the Ease of Doing Business Index (also known as Doing Business) until 2021.

However, the index’s success became its downfall. Some governments set up dedicated teams with an explicit goal of improving the country’s performance on the index. If those teams’ activity was solely focussed on positive policy reform, that would be great; unfortunately, in at least some cases, they were simply trying to game the results.

World Bank’s Business Ready Index

Index ranking optimisation (aka gaming the results)

To give an example of how that could happen, we need to take a brief detour into the world of qualitative indicators. Bear with me. In many indexes grappling with complex topics, there is a perennial problem of data availability. Imagine you want to measure the number of days it takes to set up a new business (this was one of the indicators in Doing Business). You will find that most of the time the data either doesn’t exist or is rarely updated by governments. Instead, put very simplistically, you’d need to ask a few experts or businesses for their views, and use those to create a numerical score for your index.

This is a valid approach, and it’s used in a lot of studies. Take Transparency International’s long-running Corruption Perceptions Index (CPI). Transparency International goes to great lengths to use robust and comparable data across countries, but measuring actual corruption is not viable — for obvious reasons. So the CPI does something different, and the clue is in the name: it measures people’s perceptions of corruption. It asks local businesses and experts whether they think there’s much bribery, nepotism and other forms of corruption in their country. This foundational input is then bolstered with other data points. The data doesn’t aim to measure corruption; instead, it’s about assessing which countries are more, or less, corrupt. 

Transparency International’s Corruption Perceptions Index (CPI)

This technique can work well, but it got a bit shaky as Doing Business’s fame grew. Some governments that were anxious to move up the rankings started urging the World Bank to tweak the methodology used to assess their ratings, or to use the views of specific experts. The analysts responsible for assessing a country’s scores and data points were put under significant pressure, often facing strong criticism from governments that didn’t agree with their assessments. In the end, an internal review showed that a number of countries’ scores had been improperly manipulated…The criticism must have stung, because the team behind the World Bank’s new Business Ready report has spent three years trying to address those issues. The new methodology handbook lands with a thump at 704 pages…(More)”.

AI could help scale humanitarian responses. But it could also have big downsides


Article by Thalia Beaty: “As the International Rescue Committee copes with dramatic increases in displaced people in recent years, the refugee aid organization has looked for efficiencies wherever it can — including using artificial intelligence.

Since 2015, the IRC has invested in Signpost — a portfolio of mobile apps and social media channels that answer questions in different languages for people in dangerous situations. The Signpost project, which includes many other organizations, has reached 18 million people so far, but IRC wants to significantly increase its reach by using AI tools — if they can do so safely.

Conflict, climate emergencies and economic hardship have driven up demand for humanitarian assistance, with more than 117 million people forcibly displaced in 2024, according to the United Nations refugee agency. The turn to artificial intelligence technologies is in part driven by the massive gap between needs and resources.

To meet its goal of reaching half of displaced people within three years, the IRC is testing a network of AI chatbots to see if they can increase the capacity of their humanitarian officers and the local organizations that directly serve people through Signpost. For now, the pilot project operates in El Salvador, Kenya, Greece and Italy and responds in 11 languages. It draws on a combination of large language models from some of the biggest technology companies, including OpenAI, Anthropic and Google.

The chatbot response system also uses customer service software from Zendesk and receives other support from Google and Cisco Systems.

If they decide the tools work, the IRC wants to extend the technical infrastructure to other nonprofit humanitarian organizations at no cost. They hope to create shared technology resources that less technically focused organizations could use without having to negotiate directly with tech companies or manage the risks of deployment…(More)”.

Privacy guarantees for personal mobility data in humanitarian response


Paper by Nitin Kohli,  Emily Aiken & Joshua E. Blumenstock: “Personal mobility data from mobile phones and other sensors are increasingly used to inform policymaking during pandemics, natural disasters, and other humanitarian crises. However, even aggregated mobility traces can reveal private information about individual movements to potentially malicious actors. This paper develops and tests an approach for releasing private mobility data, which provides formal guarantees over the privacy of the underlying subjects. Specifically, we (1) introduce an algorithm for constructing differentially private mobility matrices and derive privacy and accuracy bounds on this algorithm; (2) use real-world data from mobile phone operators in Afghanistan and Rwanda to show how this algorithm can enable the use of private mobility data in two high-stakes policy decisions: pandemic response and the distribution of humanitarian aid; and (3) discuss practical decisions that need to be made when implementing this approach, such as how to optimally balance privacy and accuracy. Taken together, these results can help enable the responsible use of private mobility data in humanitarian response…(More)”.