The Emergence of National Data Initiatives: Comparing proposals and initiatives in the United Kingdom, Germany, and the United States


Article by Stefaan Verhulst and Roshni Singh: “Governments are increasingly recognizing data as a pivotal asset for driving economic growth, enhancing public service delivery, and fostering research and innovation. This recognition has intensified as policymakers acknowledge that data serves as the foundational element of artificial intelligence (AI) and that advancing AI sovereignty necessitates a robust data ecosystem. However, substantial portions of generated data remain inaccessible or underutilized. In response, several nations are initiating or exploring the launch of comprehensive national data strategies designed to consolidate, manage, and utilize data more effectively and at scale. As these initiatives evolve, discernible patterns in their objectives, governance structures, data-sharing mechanisms, and stakeholder engagement frameworks reveal both shared principles and country-specific approaches.

This blog seeks to start some initial research on the emergence of national data initiatives by examining three national data initiatives and exploring their strategic orientations and broader implications. They include:

Bad data costs Americans trillions. Let’s fix it with a renewed data strategy


Article by Nick Hart & Suzette Kent: “Over the past five years, the federal government lost $200-to-$500 billion per year in fraud to improper payments — that’s up to $3,000 taken from every working American’s pocket annually. Since 2003, these preventable losses have totaled an astounding $2.7 trillion. But here’s the good news: We already have the data and technology to greatly eliminate this waste in the years ahead. The operational structure and legal authority to put these tools to work protecting taxpayer dollars needs to be refreshed and prioritized.

The challenge is straightforward: Government agencies often can’t effectively share and verify basic information before sending payments. For example, federal agencies may not be able to easily check if someone is deceased, verify income or detect duplicate payments across programs…(More)”.

Collective Intelligence: The Rise of Swarm Systems and their Impact on Society


Book edited by Uwe Seebacher and Christoph Legat: “Unlock the future of technology with this captivating exploration of swarm intelligence. Dive into the future of autonomous systems, enhanced by cutting-edge multi-agent systems and predictive research. Real-world examples illustrate how these algorithms drive intelligent, coordinated behavior in industries like manufacturing and energy. Discover the innovative Industrial-Disruption-Index (IDI), pioneered by Uwe Seebacher, which predicts industry disruptions using swarm intelligence. Case studies from media to digital imaging offer invaluable insights into the future of industrial life cycles.

Ideal for AI enthusiasts and professionals, this book provides inspiring, actionable insights for the future. It redefines artificial intelligence, showcasing how predictive intelligence can revolutionize group coordination for more efficient and sustainable systems. A crucial chapter highlights the shift from the Green Deal to the Emerald Deal, showing how swarm intelligence addresses societal challenges…(More)”.

The British state is blind


The Economist: “Britiain is a bit bigger than it thought. In 2023 net migration stood at 906,000 people, rather more than the 740,000 previously estimated, according to the Office for National Statistics. It is equivalent to discovering an extra Slough. New numbers for 2022 also arrived. At first the ONS thought net migration stood at 606,000. Now it reckons the figure was 872,000, a difference roughly the size of Stoke-on-Trent, a small English city.

If statistics enable the state to see, then the British government is increasingly short-sighted. Fundamental questions, such as how many people arrive each year, are now tricky to answer. How many people are in work? The answer is fuzzy. Just how big is the backlog of court cases? The Ministry of Justice will not say, because it does not know. Britain is a blind state.

This causes all sorts of problems. The Labour Force Survey, once a gold standard of data collection, now struggles to provide basic figures. At one point the Resolution Foundation, an economic think-tank, reckoned the ONS had underestimated the number of workers by almost 1m since 2019. Even after the ONS rejigged its tally on December 3rd, the discrepancy is still perhaps 500,000, Resolution reckons. Things are so bad that Andrew Bailey, the governor of the Bank of England, makes jokes about the inaccuracy of Britain’s job-market stats in after-dinner speeches—akin to a pilot bursting out of the cockpit mid-flight and asking to borrow a compass, with a chuckle.

Sometimes the sums in question are vast. When the Department for Work and Pensions put out a new survey on household income in the spring, it was missing about £40bn ($51bn) of benefit income, roughly 1.5% of gdp or 13% of all welfare spending. This makes things like calculating the rate of child poverty much harder. Labour mps want this line to go down. Yet it has little idea where the line is to begin with.

Even small numbers are hard to count. Britain has a backlog of court cases. How big no one quite knows: the Ministry of Justice has not published any data on it since March. In the summer, concerned about reliability, it held back the numbers (which means the numbers it did publish are probably wrong, says the Institute for Government, another think-tank). And there is no way of tracking someone from charge to court to prison to probation. Justice is meant to be blind, but not to her own conduct…(More)”.

Impact Inversion


Blog by Victor Zhenyi Wang: “The very first project I worked on when I transitioned from commercial data science to development was during the nadir between South Africa’s first two COVID waves. A large international foundation was interested in working with the South African government and a technology non-profit to build an early warning system for COVID. The non-profit operated a WhatsApp based health messaging service that served about 2 million people in South Africa. The platform had run a COVID symptoms questionnaire which the foundation hoped could help the government predict surges in cases.

This kind of data-based “nowcasting” proved a useful tool in a number of other places e.g. some cities in the US. Yet in the context of South Africa, where the National Department of Health was mired in serious capacity constraints, government stakeholders were bearish about the usefulness of such a tool. Nonetheless, since the foundation was interested in funding this project, we went ahead with it anyway. The result was that we pitched this “early warning system” a handful of times to polite public health officials but it was otherwise never used. A classic case of development practitioners rendering problems technical and generating non-solutions that primarily serve the strategic objectives of the funders.

The technology non-profit did however express interest in a different kind of service — what about a language model that helps users answer questions about COVID? The non-profit’s WhatsApp messaging service is menu-based and they thought that a natural language interface could provide a better experience for users by letting them engage with health content on their own terms. Since we had ample funding from the foundation for the early warning system, we decided to pursue the chatbot project.

The project has now spanned to multiple other services run by the same non-profit, including the largest digital health service in South Africa. The project has won multiple grants and partnerships, including with Google, and has spun out into its own open source library. In many ways, in terms of sheer number of lives affected, this is the most impactful project I have had the privilege of supporting in my career in development, and I am deeply grateful to have been part of the team involved bringing it into existence.

Yet the truth is, the “impact” of this class of interventions remain unclear. Even though a large randomized controlled trial was done to assess the impact of the WhatsApp service, such an evaluation only captures the performance of the service on outcome variables determined by the non-profit, not on whether these outcomes are appropriate. It certainly does not tell us whether the service was the best means available to achieve the ultimate goal of improving the lives of those in communities underserved by health services.

This project, and many others that I have worked on as a data scientist in development, uses an implicit framework for impact which I describe as the design-to-impact pipeline. A technology is designed and developed, then its impact is assessed on the world. There is a strong emphasis to reform, to improve the design, development, and deployment of development technologies. Development practitioners have a broad range of techniques to make sure that the process of creation is ethical and responsible — in some sense, legitimate. With the broad adoption of data-based methods of program evaluation, e.g. randomized control trials, we might even make knowledge claims that an intervention truly ought to bring certain benefits to communities in which the intervention is placed. This view imagines that technologies, once this process is completed, is simply unleashed onto the world, and its impact is simply what was assessed ex ante. An industry of monitoring and evaluation surrounds its subsequent deployment; the relative success of interventions depends on the performance of benchmark indicators…(More)”.

Online consent: how much do we need to know?


Paper by Bartlomiej Chomanski & Lode Lauwaert: “When you visit a website and click a button that says, ‘I agree to these terms’—do you really agree? Many scholars who consider this question (Solove 2013; Barocas & Nissenbaum 2014; Hull 2015; Pascalev 2017; Yeung 2017; Becker 2019; Zuboff 2019; Andreotta et al. 2022; Wolmarans and Vorhoeve 2022) would tend to answer ‘no’—or, at the very least, they would deem your agreement normatively deficient. The reasoning behind that conclusion is in large part driven by the claim that when most people click ‘I agree’ when visiting online websites and platforms, they do not really know what they are agreeing to. Their lack of knowledge about the privacy policy and other terms of the online agreements thus makes their consent problematic in morally salient ways.

We argue that this prevailing view is wrong. Uninformed consent to online terms and conditions (what we will call, for short, ‘online consent’) is less ethically problematic than many scholars suppose. Indeed, we argue that uninformed online consent preceded by the legitimate exercise of the right not to know (RNTK, to be explained below) is prima facie valid and does not appear normatively deficient in other ways, despite being uninformed.

The paper proceeds as follows. In Sect. 2, we make more precise the concept of online consent and summarize the case against it, as presented in the literature. In Sect. 3 we explain the arguments for the RNTK in bioethics and show that analogous reasoning leads to endorsing the RNTK in online contexts. In Sect. 4, we demonstrate that the appeal to the RNTK helps defuse the critics’ arguments against online consent. Section 5 concludes: online consent is valid (with caveats, to be explored in what follows)…(More)”

Data for Better Governance: Building Government Analytics Ecosystems in Latin America and the Caribbean


Report by the Worldbank: “Governments in Latin America and the Caribbean face significant development challenges, including insufficient economic growth, inflation, and institutional weaknesses. Overcoming these issues requires identifying systemic obstacles through data-driven diagnostics and equipping public officials with the skills to implement effective solutions.

Although public administrations in the region often have access to valuable data, they frequently fall short in analyzing it to inform decisions. However, the impact is big. Inefficiencies in procurement, misdirected transfers, and poorly managed human resources result in an estimated waste of 4% of GDP, equivalent to 17% of all public spending. 

The report “Data for Better Governance: Building Government Analytical Ecosystems in Latin America and the Caribbean” outlines a roadmap for developing government analytics, focusing on key enablers such as data infrastructure and analytical capacity, and offers actionable strategies for improvement…(More)”.

An Open Source Python Library for Anonymizing Sensitive Data


Paper by Judith Sáinz-Pardo Díaz & Álvaro López García: “Open science is a fundamental pillar to promote scientific progress and collaboration, based on the principles of open data, open source and open access. However, the requirements for publishing and sharing open data are in many cases difficult to meet in compliance with strict data protection regulations. Consequently, researchers need to rely on proven methods that allow them to anonymize their data without sharing it with third parties. To this end, this paper presents the implementation of a Python library for the anonymization of sensitive tabular data. This framework provides users with a wide range of anonymization methods that can be applied on the given dataset, including the set of identifiers, quasi-identifiers, generalization hierarchies and allowed level of suppression, along with the sensitive attribute and the level of anonymity required. The library has been implemented following best practices for integration and continuous development, as well as the use of workflows to test code coverage based on unit and functional tests…(More)”.

Civic Engagement & Policymaking Toolkit


About: “This toolkit serves as a guide for science centers and museums and other science engagement organizations to thoughtfully identify and implement ways to nurture civic experiences like these across their work or deepen ongoing civic initiatives for meaningful change within their communities…

This toolkit outlines a Community Science Approach, Civic Engagement & Policymaking, where science and technology are factors in collective civic action and policy decisions to meet community goals. It includes:

  • Guidance for your team on how to get started with this work,
  • An overview of what Civic Engagement & Policymaking as a Community Science Approach can entail,
  • Descriptions of four roles your organization can play to authentically engage with communities on civic priorities,
  • Examples of real collaborations between science engagement organizations and their partners that advance community priorities,
  • Tools, guides, and other resources to help you prepare for new civic engagement efforts and/or expand or deepen existing civic engagement efforts…(More)”.

Informality in Policymaking


Book edited by Lindsey Garner-Knapp, Joanna Mason, Tamara Mulherin and E. Lianne Visser: “Public policy actors spend considerable time writing policy, advising politicians, eliciting stakeholder views on policy concerns, and implementing initiatives. Yet, they also ‘hang out’ chatting at coffee machines, discuss developments in the hallway walking from one meeting to another, or wander outside to carparks for a quick word and to avoid prying eyes. Rather than interrogating the rules and procedures which govern how policies are made, this volume asks readers to begin with the informal as a concept and extend this to what people do, how they relate to each other, and how this matters.

Emerging from a desire to enquire into the lived experience of policy professionals, and to conceptualise afresh the informal in the making of public policy, Informality in Policymaking explores how informality manifests in different contexts, spaces, places, and policy arenas, and the implications of this. Including nine empirical chapters, this volume presents studies from around the world and across policy domains spanning the rural and urban, and the local to the supranational. The chapters employ interdisciplinary approaches and integrate creative elements, such as drawings of hand gestures and fieldwork photographs, in conjunction with ethnographic ‘thick descriptions’.

In unveiling the realities of how policy is made, this deeply meaningful and thoughtfully constructed collection argues that the formal is only part of the story of policymaking, and thus only part of the solutions it seeks to create. Informality in Policymaking will be of interest to researchers and policymakers alike…(More)”.