Paper by Burcu Kilic: “When Chinese start-up DeepSeek released R1 in January 2025, the groundbreaking open-source artificial intelligence (AI) model rocked the tech industry as a more cost-effective alternative to models running on more advanced chips. The launch coincided with industrial policy gaining popularity as a strategic tool for governments aiming to build AI capacity and competitiveness. Once dismissed under neoliberal economic frameworks, industrial policy is making a strong comeback with more governments worldwide embracing it to build digital public infrastructure and foster local AI ecosystems. This paper examines how the national innovation system framework can guide AI industrial policy to foster innovation and reduce reliance on dominant tech companies…(More)”.
DOGE comes for the data wonks
The Economist: “For nearly three decades the federal government has painstakingly surveyed tens of thousands of Americans each year about their health. Door-knockers collect data on the financial toll of chronic conditions like obesity and asthma, and probe the exact doses of medications sufferers take. The result, known as the Medical Expenditure Panel Survey (MEPS), is the single most comprehensive, nationally representative portrait of American health care, a balkanised and unwieldy $5trn industry that accounts for some 17% of GDP.
MEPS is part of a largely hidden infrastructure of government statistics collection now in the crosshairs of the Department of Government Efficiency (DOGE). In mid-March officials at a unit of the Department of Health and Human Services (HHS) that runs the survey told employees that DOGE had slated them for an 80-90% reduction in staff and that this would “not be a negotiation”. Since then scores of researchers have taken voluntary buyouts. Those left behind worry about the integrity of MEPS. “Very unclear whether or how we can put on MEPS” with roughly half of the staff leaving, one said. On March 27th, the health secretary, Robert F. Kennedy junior, announced an overall reduction of 10,000 personnel at the department, in addition to those who took buyouts.
There are scores of underpublicised government surveys like MEPS that document trends in everything from house prices to the amount of lead in people’s blood. Many provide standard-setting datasets and insights into the world’s largest economy that the private sector has no incentive to replicate.
Even so, America’s system of statistics research is overly analogue and needs modernising. “Using surveys as the main source of information is just not working” because it is too slow and suffers from declining rates of participation, says Julia Lane, an economist at New York University. In a world where the economy shifts by the day, the lags in traditional surveys—whose results can take weeks or even years to refine and publish—are unsatisfactory. One practical reform DOGE might encourage is better integration of administrative data such as tax records and social-security filings which often capture the entire population and are collected as a matter of course.
As in so many other areas, however, DOGE’s sledgehammer is more likely to cause harm than to achieve improvements. And for all its clunkiness, America’s current system manages a spectacular feat. From Inuits in remote corners of Alaska to Spanish-speakers in the Bronx, it measures the country and its inhabitants remarkably well, given that the population is highly diverse and spread out over 4m square miles. Each month surveys from the federal government reach about 1.5m people, a number roughly equivalent to the population of Hawaii or West Virginia…(More)”.
Researching data discomfort: The case of Statistics Norway’s quest for billing data
Paper by Lisa Reutter: “National statistics offices are increasingly exploring the possibilities of utilizing new data sources to position themselves in emerging data markets. In 2022, Statistics Norway announced that the national agency will require the biggest grocers in Norway to hand over all collected billing data to produce consumer behavior statistics which had previously been produced by other sampling methods. An online article discussing this proposal sparked a surprisingly (at least to Statistics Norway) high level of interest among readers, many of whom expressed concerns about this intended change in data practice. This paper focuses on the multifaceted online discussions of the proposal, as these enable us to study citizens’ reactions and feelings towards increased data collection and emerging public-private data flows in a Nordic context. Through an explorative empirical analysis of comment sections, this paper investigates what is discussed by commenters and reflects upon why this case sparked so much interest among citizens in the first place. It therefore contributes to the growing literature of citizens’ voices in data-driven administration and to a wider discussion on how to research public feeling towards datafication. I argue that this presents an interesting case of discomfort voiced by citizens, which demonstrates the contested nature of data practices among citizens–and their ability to regard data as deeply intertwined with power and politics. This case also reminds researchers to pay attention to seemingly benign and small changes in administration beyond artificial intelligence…(More)”
Oxford Intersections: AI in Society
Series edited by Philipp Hacker: “…provides an interdisciplinary corpus for understanding artificial intelligence (AI) as a global phenomenon that transcends geographical and disciplinary boundaries. Edited by a consortium of experts hailing from diverse academic traditions and regions, the 11 edited and curated sections provide a holistic view of AI’s societal impact. Critically, the work goes beyond the often Eurocentric or U.S.-centric perspectives that dominate the discourse, offering nuanced analyses that encompass the implications of AI for a range of regions of the world. Taken together, the sections of this work seek to move beyond the state of the art in three specific respects. First, they venture decisively beyond existing research efforts to develop a comprehensive account and framework for the rapidly growing importance of AI in virtually all sectors of society. Going beyond a mere mapping exercise, the curated sections assess opportunities, critically discuss risks, and offer solutions to the manifold challenges AI harbors in various societal contexts, from individual labor to global business, law and governance, and interpersonal relationships. Second, the work tackles specific societal and regulatory challenges triggered by the advent of AI and, more specifically, large generative AI models and foundation models, such as ChatGPT or GPT-4, which have so far received limited attention in the literature, particularly in monographs or edited volumes. Third, the novelty of the project is underscored by its decidedly interdisciplinary perspective: each section, whether covering Conflict; Culture, Art, and Knowledge Work; Relationships; or Personhood—among others—will draw on various strands of knowledge and research, crossing disciplinary boundaries and uniting perspectives most appropriate for the context at hand…(More)”.
Legal frictions for data openness
Paper by Ramya Chandrasekhar: “investigates legal entanglements of re-use, when data and content from the open web is used to train foundation AI models. Based on conversations with AI researchers and practitioners, an online workshop, and legal analysis of a repository of 41 legal disputes relating to copyright and data protection, this report highlights tensions between legal imaginations of data flows and computational processes involved in training foundation models.
To realise the promise of the open web as open for all, this report argues that efforts oriented solely towards techno-legal openness of training datasets are not enough. Techno-legal openness of datasets facilitates easy re-use of data. But, certain well-resourced actors like Big Tech are able to take advantage of data flows on the open web to internet to train proprietary foundation models, while giving little to no value back to either the maintenance of shared informational resources or communities of commoners. At the same time, open licenses no longer accommodate changing community preferences of sharing and re-use of data and content.
In addition to techno-legal openness of training datasets, there is a need for certain limits on the extractive power of well-resourced actors like BigTech combined with increased recognition of community data sovereignty. Alternative licensing frameworks, such as the Nwulite Obodo License, Kaitiakitanga Licenses, the Montreal License, the OpenRAIL Licenses, the Open Data Commons License, and the AI2Impact Licenses hold valuable insights in this regard. While these licensing frameworks impose more obligations on re-users and necessitate more collective thinking on interoperability,they are nonetheless necessary for the creation of healthy digital and data commons, to realise the original promise of the open web as open for all…(More)”.
Robotics for Global development
Report by the Frontier Tech Hub: “Robotics could enable progress on 46% of SDG targets yet this potential remains largely untapped in low and middle-income countries.
While technological developments and new-found applications of artificial intelligence (AI) keep captivating significant attention and investments, using robotics to advance the Sustainable Development Goals (SDGs) is consistently overlooked. This is especially true when the focus moves from aerial robotics (drones) to robotic arms, ground robotics, and aquatic robotics. How might these types of robots accelerate global development in the least developed countries?
We aim to answer this question and inform the UK Foreign, Commonwealth & Development Office’s (FCDO) investment and policy towards robotics in the least developed countries (LDCs). In an emergent space, the UK FCDO has a unique opportunity to position itself as a global leader in leveraging robotics technology to accelerate sustainable development outcomes…(More)”.
Towards a set of Universal data principles
Paper by Steve MacFeely, Angela Me, Friederike Schueuer, Joseph Costanzo, David Passarelli, Malarvizhi Veerappan, and Stefaan Verhulst: “Humanity collects, processes, shares, uses, and reuses a staggering volume of data. These data are the lifeblood of the digital economy; they feed algorithms and artificial intelligence, inform logistics, and shape markets, communication, and politics. Data do not just yield economic benefits; they can also have individual and societal benefits and impacts. Being able to access, process, use, and reuse data is essential for dealing with global challenges, such as managing and protecting the environment, intervening in the event of a pandemic, or responding to a disaster or crisis. While we have made great strides, we have yet to realize the full potential of data, in particular, the potential of data to serve the public good. This will require international cooperation and a globally coordinated approach. Many data governance issues cannot be fully resolved at national level. This paper presents a proposal for a preliminary set of data goals and principles. These goals and principles are envisaged as the normative foundations for an international data governance framework – one that is grounded in human rights and sustainable development. A principles-based approach to data governance helps create common values, and in doing so, helps to change behaviours, mindsets and practices. It can also help create a foundation for the safe use of all types of data and data transactions. The purpose of this paper is to present the preliminary principles to solicit reaction and feedback…(More)”.
Differential Privacy
Open access book by Simson L. Garfinkel: “Differential privacy (DP) is an increasingly popular, though controversial, approach to protecting personal data. DP protects confidential data by introducing carefully calibrated random numbers, called statistical noise, when the data is used. Google, Apple, and Microsoft have all integrated the technology into their software, and the US Census Bureau used DP to protect data collected in the 2020 census. In this book, Simson Garfinkel presents the underlying ideas of DP, and helps explain why DP is needed in today’s information-rich environment, why it was used as the privacy protection mechanism for the 2020 census, and why it is so controversial in some communities.
When DP is used to protect confidential data, like an advertising profile based on the web pages you have viewed with a web browser, the noise makes it impossible for someone to take that profile and reverse engineer, with absolute certainty, the underlying confidential data on which the profile was computed. The book also chronicles the history of DP and describes the key participants and its limitations. Along the way, it also presents a short history of the US Census and other approaches for data protection such as de-identification and k-anonymity…(More)”.
Which Data Do Economists Use to Study Corruption ?
World Bank paper: “…examines the data sources and methodologies used in economic research on corruption by analyzing 339 journal articles published in 2022 that include Journal of Economic Literature codes. The paper identifies the most commonly used data types, sources, and geographical foci, as well as whether studies primarily investigate the causes or consequences of corruption. Cross-country composite indicators remain the dominant measure, while single country studies more frequently utilize administrative data. Articles in ranked journals are more likely to employ administrative and experimental data and focus on the causes of corruption. The broader dataset of 882 articles highlights the significant academic interest in corruption across disciplines, particularly in political science and public policy. The findings raise concerns about the limited use of novel data sources and the relative neglect of research on the causes of corruption, underscoring the need for a more integrated approach within the field of economics…(More)”.
Global population data is in crisis – here’s why that matters
Article by Andrew J Tatem and Jessica Espey: “Every day, decisions that affect our lives depend on knowing how many people live where. For example, how many vaccines are needed in a community, where polling stations should be placed for elections or who might be in danger as a hurricane approaches. The answers rely on population data.
But counting people is getting harder.
For centuries, census and household surveys have been the backbone of population knowledge. But we’ve just returned from the UN’s statistical commission meetings in New York, where experts reported that something alarming is happening to population data systems globally.
Census response rates are declining in many countries, resulting in large margins of error. The 2020 US census undercounted America’s Latino population by more than three times the rate of the 2010 census. In Paraguay, the latest census revealed a population one-fifth smaller than previously thought.
South Africa’s 2022 census post-enumeration survey revealed a likely undercount of more than 30%. According to the UN Economic Commission for Africa, undercounts and census delays due to COVID-19, conflict or financial limitations have resulted in an estimated one in three Africans not being counted in the 2020 census round.
When people vanish from data, they vanish from policy. When certain groups are systematically undercounted – often minorities, rural communities or poorer people – they become invisible to policymakers. This translates directly into political underrepresentation and inadequate resource allocation…(More)”.