3 barriers to successful data collaboratives


Article by Federico Bartolomucci: “Data collaboratives have proliferated in recent years as effective means of promoting the use of data for social good. This type of social partnership involves actors from the private, public, and not-for-profit sectors working together to leverage public or private data to enhance collective capacity to address societal and environmental challenges. The California Data Collaborative for instance, combines the data of numerous Californian water managers to enhance data-informed policy and decision making. 

But, in my years as a researcher studying more than a hundred cases of data collaborativesI have observed widespread feelings of isolation among collaborating partners due to the absence of success-proven reference models. …Below, I provide an overview of three governance challenges faced by practitioners, as well as recommendations for addressing them. In doing so, I encourage every practitioner embarking on a data collaborative initiative to reflect on these challenges and create ad-hoc strategies to address them…

1. Overly relying on grant funding limits a collaborative’s options.

Data Collaboratives are typically conceived as not-for-profit projects, relying solely on grant funding from the founding partners. This is the case, for example, with TD1_Index, a global collaboration that seeks to gather data on Type 1 diabetes, raise awareness, and advance research on the topic. Although grant funding schemas work in some cases (like in that of T1D_Index), relying solely on grant funding makes a data collaborative heavily dependent on the willingness of one or more partners to sustain its activities and hinders its ability to achieve operational and decisional autonomy.

Operational and decisional autonomy indeed appears to be a beneficial condition for a collaborative to develop trust, involve other partners, and continuously adapt its activities and structure to external events—characteristics required for operating in a highly innovative sector.

Hybrid business models that combine grant funding with revenue-generating activities indicate a promising evolutionary path. The simplest way to do this is to monetize data analysis and data stewardship services. The ActNow Coalition, a U.S.-based not-for-profit organization, combines donations with client-funded initiatives in which the team provides data collection, analysis, and visualization services. Offering these types of services generates revenues for the collaborative and gaining access to them is among the most compelling incentives for partners to join the collaboration.

In studying data collaboratives around the world, two models emerge as most effective: (1) pay-per-use models, in which collaboration partners can access data-related services on demand (see Civity NL and their project Sniffer Bike) and (2) membership models, in which participation in the collaborative entitles partners to access certain services under predefined conditions (see the California Data Collaborative).

2. Demonstrating impact is key to a collaborative’s survival. 

As partners’ participation in data collaboratives is primarily motivated by a shared social purpose, the collaborative’s ability to demonstrate its efficacy in achieving its purpose means being able to defend its raison d’être. Demonstrating impact enables collaboratives to retain existing partners, renew commitments, and recruit new partners…(More)”.

If good data is key to decarbonization, more than half of Asia’s economies are being locked out of progress, this report says


Blog by Ewan Thomson: “If measuring something is the first step towards understanding it, and understanding something is necessary to be able to improve it, then good data is the key to unlocking positive change. This is particularly true in the energy sector as it seeks to decarbonize.

But some countries have a data problem, according to energy think tank Ember and climate solutions enabler Subak’s Asia Data Transparency Report 2023, and this lack of open and reliable power-generation data is holding back the speed of the clean power transition in the region.

Asia is responsible for around 80% of global coal consumption, making it a big contributor to carbon emissions. Progress is being made on reducing these emissions, but without reliable data on power generation, measuring the rate of this progress will be challenging.

These charts show how different Asian economies are faring on data transparency on power generation and what can be done to improve both the quality and quantity of the data.

Infographic showing the number of economies by overall transparency score.

Over half of Asian countries lack reliable data in their power sectors, Ember says. Image: Ember

There are major data gaps in 24 out of the 39 Asian economies covered in the Ember research. This means it is unclear whether the energy needs of the nearly 700 million people in these 24 economies are being met with renewables or fossil fuels…(More)”.

AI Is Tearing Wikipedia Apart


Article by Claire Woodcock: “As generative artificial intelligence continues to permeate all aspects of culture, the people who steward Wikipedia are divided on how best to proceed. 

During a recent community call, it became apparent that there is a community split over whether or not to use large language models to generate content. While some people expressed that tools like Open AI’s ChatGPT could help with generating and summarizing articles, others remained wary. 

The concern is that machine-generated content has to be balanced with a lot of human review and would overwhelm lesser-known wikis with bad content. While AI generators are useful for writing believable, human-like text, they are also prone to including erroneous information, and even citing sources and academic papers which don’t exist. This often results in text summaries which seem accurate, but on closer inspection are revealed to be completely fabricated

“The risk for Wikipedia is people could be lowering the quality by throwing in stuff that they haven’t checked,” Bruckman added. “I don’t think there’s anything wrong with using it as a first draft, but every point has to be verified.” 

The Wikimedia Foundation, the nonprofit organization behind the website, is looking into building tools to make it easier for volunteers to identify bot-generated content. Meanwhile, Wikipedia is working to draft a policy that lays out the limits to how volunteers can use large language models to create content.

The current draft policy notes that anyone unfamiliar with the risks of large language models should avoid using them to create Wikipedia content, because it can open the Wikimedia Foundation up to libel suits and copyright violations—both of which the nonprofit gets protections from but the Wikipedia volunteers do not. These large language models also contain implicit biases, which often result in content skewed against marginalized and underrepresented groups of people

The community is also divided on whether large language models should be allowed to train on Wikipedia content. While open access is a cornerstone of Wikipedia’s design principles, some worry the unrestricted scraping of internet data allows AI companies like OpenAI to exploit the open web to create closed commercial datasets for their models. This is especially a problem if the Wikipedia content itself is AI-generated, creating a feedback loop of potentially biased information, if left unchecked…(More)”.

Mapping the discourse on evidence-based policy, artificial intelligence, and the ethical practice of policy analysis


Paper by Joshua Newman and Michael Mintrom: “Scholarship on evidence-based policy, a subset of the policy analysis literature, largely assumes information is produced and consumed by humans. However, due to the expansion of artificial intelligence in the public sector, debates no longer capture the full range concerns. Here, we derive a typology of arguments on evidence-based policy that performs two functions: taken separately, the categories serve as directions in which debates may proceed, in light of advances in technology; taken together, the categories act as a set of frames through which the use of evidence in policy making might be understood. Using a case of welfare fraud detection in the Netherlands, we show how the acknowledgement of divergent frames can enable a holistic analysis of evidence use in policy making that considers the ethical issues inherent in automated data processing. We argue that such an analysis will enhance the real-world relevance of the evidence-based policy paradigm….(More)”

The Ethics of Artificial Intelligence for the Sustainable Development Goals


Book by Francesca Mazzi and Luciano Floridi: “Artificial intelligence (AI) as a general-purpose technology has great potential for advancing the United Nations Sustainable Development Goals (SDGs). However, the AI×SDGs phenomenon is still in its infancy in terms of diffusion, analysis, and empirical evidence. Moreover, a scalable adoption of AI solutions to advance the achievement of the SDGs requires private and public actors to engage in coordinated actions that have been analysed only partially so far. This volume provides the first overview of the AI×SDGs phenomenon and its related challenges and opportunities. The first part of the book adopts a programmatic approach, discussing AI×SDGs at a theoretical level and from the perspectives of different stakeholders. The second part illustrates existing projects and potential new applications…(More)”.

Will A.I. Become the New McKinsey?


Essay by Ted Chiang: “When we talk about artificial intelligence, we rely on metaphor, as we always do when dealing with something new and unfamiliar. Metaphors are, by their nature, imperfect, but we still need to choose them carefully, because bad ones can lead us astray. For example, it’s become very common to compare powerful A.I.s to genies in fairy tales. The metaphor is meant to highlight the difficulty of making powerful entities obey your commands; the computer scientist Stuart Russell has cited the parable of King Midas, who demanded that everything he touched turn into gold, to illustrate the dangers of an A.I. doing what you tell it to do instead of what you want it to do. There are multiple problems with this metaphor, but one of them is that it derives the wrong lessons from the tale to which it refers. The point of the Midas parable is that greed will destroy you, and that the pursuit of wealth will cost you everything that is truly important. If your reading of the parable is that, when you are granted a wish by the gods, you should phrase your wish very, very carefully, then you have missed the point.

So, I would like to propose another metaphor for the risks of artificial intelligence. I suggest that we think about A.I. as a management-consulting firm, along the lines of McKinsey & Company. Firms like McKinsey are hired for a wide variety of reasons, and A.I. systems are used for many reasons, too. But the similarities between McKinsey—a consulting firm that works with ninety per cent of the Fortune 100—and A.I. are also clear. Social-media companies use machine learning to keep users glued to their feeds. In a similar way, Purdue Pharma used McKinsey to figure out how to “turbocharge” sales of OxyContin during the opioid epidemic. Just as A.I. promises to offer managers a cheap replacement for human workers, so McKinsey and similar firms helped normalize the practice of mass layoffs as a way of increasing stock prices and executive compensation, contributing to the destruction of the middle class in America…(More)”.

Data Sharing Between Public and Private Sectors: When Local Governments Seek Information from the Sharing Economy.


Paper by the Centre for Information Policy Leadership: “…addresses the growing trend of localities requesting (and sometimes mandating) that data collected by the private sector be shared with the localities themselves. Such requests are generally not in the context of law enforcement or national security matters, but rather are part of an effort to further the public interest or promote a public good.

To the extent such requests are overly broad or not specifically tailored to the stated public interest, CIPL believes that the public sector’s adoption of accountability measures—which CIPL has repeatedly promoted for the private sector—can advance responsible data sharing practices between the two sectors. It can also strengthen the public’s confidence in data-driven initiatives that seek to improve their communities…(More)”.

Spatial data trusts: an emerging governance framework for sharing spatial data


Paper by Nenad Radosevic et al: “Data Trusts are an important emerging approach to enabling the much wider sharing of data from many different sources and for many different purposes, backed by the confidence of clear and unambiguous data governance. Data Trusts combine the technical infrastructure for sharing data with the governance framework of a legal trust. The concept of a data Trust applied specifically to spatial data offers significant opportunities for new and future applications, addressing some longstanding barriers to data sharing, such as location privacy and data sovereignty. This paper introduces and explores the concept of a ‘spatial data Trust’ by identifying and explaining the key functions and characteristics required to underpin a data Trust for spatial data. The work identifies five key features of spatial data Trusts that demand specific attention and connects these features to a history of relevant work in the field, including spatial data infrastructures (SDIs), location privacy, and spatial data quality. The conclusions identify several key strands of research for the future development of this rapidly emerging framework for spatial data sharing…(More)”.

From Fragmentation to Coordination: The Case for an Institutional Mechanism for Cross-Border Data Flows


Report by the World Economic Forum: “Digital transformation of the global economy is bringing markets and people closer. Few conveniences of modern life – from international travel to online shopping to cross-border payments – would exist without the free flow of data.

Yet, impediments to free-flowing data are growing. The “Data Free Flow with Trust (DFFT)” concept is based on the idea that responsible data concerns, such as privacy and security, can be addressed without obstructing international data transfers. Policy-makers, trade negotiators and regulators are actively working on this, and while important progress has been made, an effective and trusted international cooperation mechanism would amplify their progress.

This white paper makes the case for establishing such a mechanism with a permanent secretariat, starting with the Group of Seven (G7) member-countries, and ensuring participation of high-level representatives of multiple stakeholder groups, including the private sector, academia and civil society.

This new institution would go beyond short-term fixes and catalyse long-term thinking to operationalize DFFT…(More)”.

Unlocking the Power of Data Refineries for Social Impact


Essay by Jason Saul & Kriss Deiglmeier: “In 2021, US companies generated $2.77 trillion in profits—the largest ever recorded in history. This is a significant increase since 2000 when corporate profits totaled $786 billion. Social progress, on the other hand, shows a very different picture. From 2000 to 2021, progress on the United Nations Sustainable Development Goals has been anemic, registering less than 10 percent growth over 20 years.

What explains this massive split between the corporate and the social sectors? One explanation could be the role of data. In other words, companies are benefiting from a culture of using data to make decisions. Some refer to this as the “data divide”—the increasing gap between the use of data to maximize profit and the use of data to solve social problems…

Our theory is that there is something more systemic going on. Even if nonprofit practitioners and policy makers had the budget, capacity, and cultural appetite to use data; does the data they need even exist in the form they need it? We submit that the answer to this question is a resounding no. Usable data doesn’t yet exist for the sector because the sector lacks a fully functioning data ecosystem to create, analyze, and use data at the same level of effectiveness as the commercial sector…(More)”.