The Data-Informed City: A Conceptual Framework for Advancing Research and Practice


Paper by Jorrit de Jong, Fernando Fernandez-Monge et al: “Over the last decades, scholars and practitioners have focused their attention on the use of data for improving public action, with a renewed interest in the emergence of big data and artificial intelligence. The potential of data is particularly salient in cities, where vast amounts of data are being generated from traditional and novel sources. Despite this growing interest, there is a need for a conceptual and operational understanding of the beneficial uses of data. This article presents a comprehensive and precise account of how cities can use data to address problems more effectively, efficiently, equitably, and in a more accountable manner. It does so by synthesizing and augmenting current research with empirical evidence derived from original research and learnings from a program designed to strengthen city governments’ data capacity. The framework can be used to support longitudinal and comparative analyses as well as explore questions such as how different uses of data employed at various levels of maturity can yield disparate outcomes. Practitioners can use the framework to identify and prioritize areas in which building data capacity might further the goals of their teams and organizations…(More)

Introducing CC Signals: A New Social Contract for the Age of AI


Creative Commons: “Creative Commons (CC) today announces the public kickoff of the CC signals project, a new preference signals framework designed to increase reciprocity and sustain a creative commons in the age of AI. The development of CC signals represents a major step forward in building a more equitable, sustainable AI ecosystem rooted in shared benefits. This step is the culmination of years of consultation and analysis. As we enter this new phase of work, we are actively seeking input from the public. 

As artificial intelligence (AI) transforms how knowledge is created, shared, and reused, we are at a fork in the road that will define the future of access to knowledge and shared creativity. One path leads to data extraction and the erosion of openness; the other leads to a walled-off internet guarded by paywalls. CC signals offer another way, grounded in the nuanced values of the commons expressed by the collective.

Based on the same principles that gave rise to the CC licenses and tens of billions of works openly licensed online, CC signals will allow dataset holders to signal their preferences for how their content can be reused by machines based on a set of limited but meaningful options shaped in the public interest. They are both a technical and legal tool and a social proposition: a call for a new pact between those who share data and those who use it to train AI models.

“CC signals are designed to sustain the commons in the age of AI,” said Anna Tumadóttir, CEO, Creative Commons. “Just as the CC licenses helped build the open web, we believe CC signals will help shape an open AI ecosystem grounded in reciprocity.”

CC signals recognize that change requires systems-level coordination. They are tools that will be built for machine and human readability, and are flexible across legal, technical, and normative contexts. However, at their core CC signals are anchored in mobilizing the power of the collective. While CC signals may range in enforceability, legally binding in some cases and normative in others, their application will always carry ethical weight that says we give, we take, we give again, and we are all in this together.

Now Ready for Feedback 

More information about CC signals and early design decisions are available on the CC website. We are committed to developing CC signals transparently and alongside our partners and community. We are actively seeking public feedback and input over the next few months as we work toward an alpha launch in November 2025….(More)”

Robodebt: When automation fails


Article by Don Moynihan: “From 2016 to 2020, the Australian government operated an automated debt assessment and recovery system, known as “Robodebt,” to recover fraudulent or overpaid welfare benefits. The goal was to save $4.77 billion through debt recovery and reduced public service costs. However, the algorithm and policies at the heart of Robodebt caused wildly inaccurate assessments, and administrative burdens that disproportionately impacted those with the least resources. After a federal court ruled the policy unlawful, the government was forced to terminate Robodebt and agree to a $1.8 billion settlement.

Robodebt is important because it is an example of a costly failure with automation. By automation, I mean the use of data to create digital defaults for decisions. This could involve the use of AI, or it could mean the use of algorithms reading administrative data. Cases like Robodebt serve as canaries in the coalmine for policymakers interested in using AI or algorithms as an means to downsize public services on the hazy notion that automation will pick up the slack. But I think they are missing the very real risks involved.

To be clear, the lesson is not “all automation is bad.” Indeed, it offer real benefits in potentially reducing administrative costs and hassles and increasing access to public services (e.g. the use of automated or “ex parte” renewals for Medicaid, for example, which Republicans are considering limiting in their new budget bill). It is this promise that makes automation so attractive to policymakers. But it is also the case that automation can be used to deny access to services, and to put people into digital cages that are burdensome to escape from. This is why we need to learn from cases where it has been deployed.

The experience of Robodebt underlines the dangers of using citizens as lab rats to adopt AI on a broad scale before it is has been proven to work. Alongside the parallel collapse of the Dutch government childcare system, Robodebt provides an extraordinarily rich text to understand how automated decision processes can go wrong.

I recently wrote about Robodebt (with co-authors Morten Hybschmann, Kathryn Gimborys, Scott Loudin, Will McClellan), both in the journal of Perspectives on Public Management and Governance and as a teaching case study at the Better Government Lab...(More)”.

Practitioner perspectives on informing decisions in One Health sectors with predictive models


Paper by Kim M. Pepin: “Every decision a person makes is based on a model. A model is an idea about how a process works based on previous experience, observation, or other data. Models may not be explicit or stated (Johnson-Laird, 2010), but they serve to simplify a complex world. Models vary dramatically from conceptual (idea) to statistical (mathematical expression relating observed data to an assumed process and/or other data) or analytical/computational (quantitative algorithm describing a process). Predictive models of complex systems describe an understanding of how systems work, often in mathematical or statistical terms, using data, knowledge, and/or expert opinion. They provide means for predicting outcomes of interest, studying different management decision impacts, and quantifying decision risk and uncertainty (Berger et al. 2021; Li et al. 2017). They can help decision-makers assimilate how multiple pieces of information determine an outcome of interest about a complex system (Berger et al. 2021; Hemming et al. 2022).

People rely daily on system-level models to reach objectives. Choosing the fastest route to a destination is one example. Such a decision may be based on either a mental model of the road system developed from previous experience or a traffic prediction mapping application based on mathematical algorithms and current data. Either way, a system-level model has been applied and there is some uncertainty. In contrast, predicting outcomes for new and complex phenomena, such as emerging disease spread, a biological invasion risk (Chen et al. 2023; Elderd et al. 2006; Pepin et al. 2022), or climatic impacts on ecosystems is more uncertain. Here public service decision-makers may turn to mathematical models when expert opinion and experience do not resolve enough uncertainty about decision outcomes. But using models to guide decisions also relies on expert opinion and experience. Also, even technical experts need to make modeling choices regarding model structure and data inputs that have uncertainty (Elderd et al. 2006) and these might not be completely objective decisions (Bedson et al. 2021). Thus, using models for guiding decisions has subjectivity from both the developer and end-user, which can lead to apprehension or lack of trust about using models to inform decisions.

Models may be particularly advantageous to decision-making in One Health sectors, including health of humans, agriculture, wildlife, and the environment (hereafter called One Health sectors) and their interconnectedness (Adisasmito et al. 2022)…(More)”.

The Global A.I. Divide


Article by Adam Satariano and Paul Mozur: “Last month, Sam Altman, the chief executive of the artificial intelligence company OpenAI, donned a helmet, work boots and a luminescent high-visibility vest to visit the construction site of the company’s new data center project in Texas.

Bigger than New York’s Central Park, the estimated $60 billion project, which has its own natural gas plant, will be one of the most powerful computing hubs ever created when completed as soon as next year.

Around the same time as Mr. Altman’s visit to Texas, Nicolás Wolovick, a computer science professor at the National University of Córdoba in Argentina, was running what counts as one of his country’s most advanced A.I. computing hubs. It was in a converted room at the university, where wires snaked between aging A.I. chips and server computers.

“Everything is becoming more split,” Dr. Wolovick said. “We are losing.”

Artificial intelligence has created a new digital divide, fracturing the world between nations with the computing power for building cutting-edge A.I. systems and those without. The split is influencing geopolitics and global economics, creating new dependencies and prompting a desperate rush to not be excluded from a technology race that could reorder economies, drive scientific discovery and change the way that people live and work.

The biggest beneficiaries by far are the United States, China and the European Union. Those regions host more than half of the world’s most powerful data centers, which are used for developing the most complex A.I. systems, according to data compiled by Oxford University researchers. Only 32 countries, or about 16 percent of nations, have these large facilities filled with microchips and computers, giving them what is known in industry parlance as “compute power.”..(More)”.

Library Catalogues as Data: Research, Practice and Usage


Book by Paul Gooding, Melissa Terras, and Sarah Ames: “Through the web of library catalogues, library management systems and myriad digital resources, libraries have become repositories not only for physical and digital information resources but also for enormous amounts of data about the interactions between these resources and their users. Bringing together leading practitioners and academic voices, this book considers library catalogue data as a vital research resource.

Divided into four sections, each approaches library catalogues, collections and records from a different angle, from exploring methods for examining such data; to the politics of catalogues and library data; their interdisciplinary potential; and practical uses and applications of catalogues as data. Other topics the volume discusses include:

  • Practical routes to preparing library catalogue data for researchers
  • The ethics of library metadata privacy and reuse
  • Data-driven decision making
  • Data quality and collections bias
  • Preserving, resurrecting and restoring data
  • The uses and potential of historical library data
  • The intersection of catalogue data, AI and Large Language Models (LLMs)

This comprehensive book will be an essential read for practitioners in the GLAM sector, particularly those dealing with collections and catalogue data, and LIS academics and students…(More)”

Misinformation by Omission: The Need for More Environmental Transparency in AI


Paper by Sasha Luccioni, Boris Gamazaychikov, Theo Alves da Costa, and Emma Strubell: “In recent years, Artificial Intelligence (AI) models have grown in size and complexity, driving greater demand for computational power and natural resources. In parallel to this trend, transparency around the costs and impacts of these models has decreased, meaning that the users of these technologies have little to no information about their resource demands and subsequent impacts on the environment. Despite this dearth of adequate data, escalating demand for figures quantifying AI’s environmental impacts has led to numerous instances of misinformation evolving from inaccurate or de-contextualized best-effort estimates of greenhouse gas emissions. In this article, we explore pervasive myths and misconceptions shaping public understanding of AI’s environmental impacts, tracing their origins and their spread in both the media and scientific publications. We discuss the importance of data transparency in clarifying misconceptions and mitigating these harms, and conclude with a set of recommendations for how AI developers and policymakers can leverage this information to mitigate negative impacts in the future…(More)”.

ChatGPT Has Already Polluted the Internet So Badly That It’s Hobbling Future AI Development


Article by Frank Landymore: “The rapid rise of ChatGPT — and the cavalcade of competitors’ generative models that followed suit — has polluted the internet with so much useless slop that it’s already kneecapping the development of future AI models.

As the AI-generated data clouds the human creations that these models are so heavily dependent on amalgamating, it becomes inevitable that a greater share of what these so-called intelligences learn from and imitate is itself an ersatz AI creation. 

Repeat this process enough, and AI development begins to resemble a maximalist game of telephone in which not only is the quality of the content being produced diminished, resembling less and less what it’s originally supposed to be replacing, but in which the participants actively become stupider. The industry likes to describe this scenario as AI “model collapse.”

As a consequence, the finite amount of data predating ChatGPT’s rise becomes extremely valuable. In a new featureThe Register likens this to the demand for “low-background steel,” or steel that was produced before the detonation of the first nuclear bombs, starting in July 1945 with the US’s Trinity test. 

Just as the explosion of AI chatbots has irreversibly polluted the internet, so did the detonation of the atom bomb release radionuclides and other particulates that have seeped into virtually all steel produced thereafter. That makes modern metals unsuitable for use in some highly sensitive scientific and medical equipment. And so, what’s old is new: a major source of low-background steel, even today, is WW1 and WW2 era battleships, including a huge naval fleet that was scuttled by German Admiral Ludwig von Reuter in 1919…(More)”.

National engagement on public trust in data use for single patient record and GP health record published


HTN Article: “A large-scale public engagement report commissioned by NHSE on building and maintaining public trust in data use across health and care has been published, focusing on the approach to creating a single patient record and the secondary use of GP data.

It noted “relief” and “enthusiasm” from participants around not having to repeat their health history when interacting with different parts of the health and care system, and highlighted concerns about data accuracy, privacy, and security.

120 participants were recruited for tier one, with 98 remaining by the end, for 15 hours of deliberation over three days in locations including Liverpool, Leicester, Portsmouth, and South London. Inclusive engagement for tier two recruited 76 people from “seldom heard groups” such as those with health needs or socially marginalised groups for interviews and small group sessions. A nationally representative ten-minute online survey with 2,000 people was also carried out in tier three.

“To start with, the concept of a single patient record was met with relief and enthusiasm across Tier 1 and Tier 2 participants,” according to the report….

When it comes to GP data, participants were “largely unaware” of secondary uses, but initially expressed comfort in the idea of it being used for saving lives, improving care, prevention, and efficiency in delivery of services. Concerns were broadly similar to those about the single patient record: concerns about data breaches, incorrect data, misuse, sensitivity of data being shared, bias against individuals, and the potential for re-identification. Some participants felt GP data should be treated differently because “it is likely to contain more intimate information”, offering greater risk to the individual patient if data were to be misused. Others felt it should be included alongside secondary care data to ensure a “comprehensive dataset”.

Participants were “reassured” overall by safeguards in place such as de-identification, staff training in data handling and security, and data regulation such as GDPR and the Data Protection Act. “There was a widespread feeling among Tier 1 and Tier 2 participants that the current model of the GP being the data controller for both direct care and secondary uses placed too much of a burden on GPs when it came to how data is used for secondary purposes,” findings show. “They wanted to see a new model which would allow for greater consistency of approach, transparency, and accountability.” Tier one participants suggested this could be a move to national or regional decision-making on secondary use. Tier three participants who only engaged with the topic online were “more resistant” to moving away from GPs as sole data controllers, with the report stating: “This greater reluctance to change demonstrates the need for careful communication with the public about this topic as changes are made, and continued involvement of the public.”..(More)”.

Disappearing people: A global demographic data crisis threatens public policy


Article by Jessica M. Espey, Andrew J. Tatem, and Dana R. Thomson: “Every day, decisions that affect our lives—such as where to locate hospitals and how to allocate resources for schools—depend on knowing how many people live where and who they are; for example, their ages, occupations, living conditions, and needs. Such core demographic data in most countries come from a census, a count of the population usually conducted every 10 years. But something alarming is happening to many of these critical data sources. As widely discussed at the United Nations (UN) Statistical Commission meeting in New York in March, fewer countries have managed to complete a census in recent years. And even when they are conducted, censuses have been shown to undercount members of certain groups in important ways. Redressing this predicament requires investment and technological solutions alongside extensive political outreach, citizen engagement, and new partnerships…(More)”