Frontier AI: double-edged sword for public sector


Article by Zeynep Engin: “The power of the latest AI technologies, often referred to as ‘frontier AI’, lies in their ability to automate decision-making by harnessing complex statistical insights from vast amounts of unstructured data, using models that surpass human understanding. The introduction of ChatGPT in late 2022 marked a new era for these technologies, making advanced AI models accessible to a wide range of users, a development poised to permanently reshape how our societies function.

From a public policy perspective, this capacity offers the optimistic potential to enable personalised services at scale, potentially revolutionising healthcare, education, local services, democratic processes, and justice, tailoring them to everyone’s unique needs in a digitally connected society. The ambition is to achieve better outcomes than humanity has managed so far without AI assistance. There is certainly a vast opportunity for improvement, given the current state of global inequity, environmental degradation, polarised societies, and other chronic challenges facing humanity.

However, it is crucial to temper this optimism with recognising the significant risks. In their current trajectories, these technologies are already starting to undermine hard-won democratic gains and civil rights. Integrating AI into public policy and decision-making processes risks exacerbating existing inequalities and unfairness, potentially leading to new, uncontrollable forms of discrimination at unprecedented speed and scale. The environmental impacts, both direct and indirect, could be catastrophic, while the rise of AI-powered personalised misinformation and behavioural manipulation is contributing to increasingly polarised societies.

Steering the direction of AI to be in the public interest requires a deeper understanding of its characteristics and behaviour. To imagine and design new approaches to public policy and decision-making, we first need a comprehensive understanding of what this remarkable technology offers and its potential implications…(More)”.

Policies must be justified by their wellbeing-to-cost ratio


Article by Richard Layard: “…What is its value for money — that is, how much wellbeing does it deliver per (net) pound it costs the government? This benefit/cost ratio (or BCR) should be central to every discussion.

The science exists to produce these numbers and, if the British government were to require them of the spending departments, it would be setting an example of rational government to the whole world.

Such a move would, of course, lead to major changes in priorities. At the London School of Economics we have been calculating the benefits and costs of policies across a whole range of government departments.

In our latest report on value for money, the best policies are those that save the government more money than they cost — for example by getting people back to work. Classic examples of this are treatments for mental health. The NHS Talking Therapies programme now treats 750,000 people a year for anxiety disorders and depression. Half of them recover and the service demonstrably pays for itself. It needs to expand.

But we also need a parallel service for those addicted to alcohol, drugs and gambling. These individuals are more difficult to treat — but the savings if they recover are greater. Again, it will pay for itself. And so will the improved therapy service for children and young people that Labour has promised.

However, most spending policies do cost more than they save. For these it is crucial to measure the benefit/cost ratio, converting the wellbeing benefit into its monetary equivalent. For example, we can evaluate the wellbeing gain to a community of having more police and subsequently less crime. Once this is converted into money, we calculate that the benefit/cost ratio is 12:1 — very high…(More)”.

Data sovereignty for local governments. Considerations and enablers


Report by JRC Data sovereignty for local governments refers to a capacity to control and/or access data, and to foster a digital transformation aligned with societal values and EU Commission political priorities. Data sovereignty clauses are an instrument that local governments may use to compel companies to share data of public interest. Albeit promising, little is known about the peculiarities of this instrument and how it has been implemented so far. This policy brief aims at filling the gap by systematising existing knowledge and providing policy-relevant recommendations for its wider implementation…(More)”.

AI has a democracy problem. Citizens’ assemblies can help.


Article by Jack Stilgoe: “…With AI, beneath all the hype, some companies know that they have a democracy problem. OpenAI admitted as much when they funded a program of pilot projects for what they called “Democratic Inputs to AI.” There have been some interesting efforts to involve the public in rethinking cutting-edge AI. A collaboration between Anthropic, one of OpenAI’s competitors, and the Collective Intelligence Project asked 1000 Americans to help shape what they called “Collective Constitutional AI.” They were asked to vote on statements such as “the AI should not be toxic” and “AI should be interesting,” and they were given the option of adding their own statements (one of the stranger statements reads “AI should not spread Marxist communistic ideology”). Anthropic used these inputs to tweak its “Claude” Large Language Model, which, when tested against standard AI benchmarks, seemed to help mitigate the model’s biases.

In using the word “constitutional,” Anthropic admits that, in making AI systems, they are doing politics by other means. We should welcome the attempt to open up. But, ultimately, these companies are interested in questions of design, not regulation. They would like there to be a societal consensus, a set of human values to which they can “align” their systems. Politics is rarely that neat…(More)”.

Children and Young People’s Participation in Climate Assemblies


Guide by KNOCA: “This guide draws on the experiences and advice of children, young people and adults involved in citizens’ assemblies that have taken place at national, city and community levels across nine countries, highlighting that:

  • Involving children and young people can enrich the intergenerational legitimacy and impact of climate assemblies: adult assembly members are reminded of their responsibilities to younger and future generations, and children and young people feel listened to, valued and taken seriously.
  • Involving children and young people has significant potential to strengthen the future of democracy and climate governance by enhancing democratic and climate literacy within education systems.
  • Children and young people can and should be involved in climate assemblies in different ways. Most importantly, children and young people should be involved from the very beginning of the process to ensure it reflects children and young people’s own ideas.
  • There are practical, ethical and design factors to consider when working with children and young people which can often be positively navigated by taking a child rights-based approach to the conceptualisation, design and delivery of climate assemblies…(More)”.

Breaking the Wall of Digital Heteronomy


Interview with Julia Janssen: “The walls of algorithms increasingly shape your life. Telling what to buy, where to go, what news to believe or songs to listen to. Data helps to navigate the world’s complexity and its endless possibilities. Artificial intelligence promises frictionless experiences, tailored and targeted, seamless and optimized to serve you best. But, at what cost? Frictionlessness comes with obedience. To the machine, the market and your own prophesy.

Mapping the Oblivion researches the influence of data and AI on human autonomy. The installation visualized Netflix’s percentage-based prediction models to provoke questions about to what extent we want to quantify choices. Will you only watch movies that are over 64% to your liking? Dine at restaurants that match your appetite above 76%. Date people with a compatibility rate of 89%? Will you never choose the career you want when there is only a 12% chance you’ll succeed? Do you want to outsmart your intuition with systems you do not understand and follow the map of probabilities and statistics?

Digital heteronomy is a condition in which one is guided by data, governed by AI and ordained by the industry. Homo Sapiens, the knowing being becomes Homo Stultus, the controllable being.

Living a quantified life in a numeric world. Not having to choose, doubt or wonder. Kept safe, risk-free and predictable within algorithmic walls. Exhausted of autonomy, creativity and randomness. Imprisoned in bubbles, profiles and behavioural tribes. Controllable, observable and monetizable.

Breaking the wall of digital heteronomy means taking back control over our data, identity, choices and chances in life. Honouring the unexpected, risk, doubt and having an unknown future. Shattering the power structures created by Big Tech to harvest information and capitalize on unfairness, vulnerabilities and fears. Breaking the wall of digital heteronomy means breaking down a system where profit is more important than people…(More)”.

AI firms must play fair when they use academic data in training


Nature Editorial: “But others are worried about principles such as attribution, the currency by which science operates. Fair attribution is a condition of reuse under CC BY, a commonly used open-access copyright license. In jurisdictions such as the European Union and Japan, there are exemptions to copyright rules that cover factors such as attribution — for text and data mining in research using automated analysis of sources to find patterns, for example. Some scientists see LLM data-scraping for proprietary LLMs as going well beyond what these exemptions were intended to achieve.

In any case, attribution is impossible when a large commercial LLM uses millions of sources to generate a given output. But when developers create AI tools for use in science, a method known as retrieval-augmented generation could help. This technique doesn’t apportion credit to the data that trained the LLM, but does allow the model to cite papers that are relevant to its output, says Lucy Lu Wang, an AI researcher at the University of Washington in Seattle.

Giving researchers the ability to opt out of having their work used in LLM training could also ease their worries. Creators have this right under EU law, but it is tough to enforce in practice, says Yaniv Benhamou, who studies digital law and copyright at the University of Geneva. Firms are devising innovative ways to make it easier. Spawning, a start-up company in Minneapolis, Minnesota, has developed tools to allow creators to opt out of data scraping. Some developers are also getting on board: OpenAI’s Media Manager tool, for example, allows creators to specify how their works can be used by machine-learning algorithms…(More)”.

The Imperial Origins of Big Data


Blog and book by Asheesh Kapur Siddique: “We live in a moment of massive transformation in the nature of information. In 2020, according to one report, users of the Internet created 64.2 zetabytes of data, a quantity greater than the “number of detectable stars in the cosmos,” a colossal increase whose origins can be traced to the emergence of the World Wide Web in 1993.1 Facilitated by technologies like satellites, smartphones, and artificial intelligence, the scale and speed of data creation seems like it may only balloon over the rest of our lifetimes—and with it, the problem of how to govern ourselves in relation to the inequalities and opportunities that the explosion of data creates.

But while much about our era of big data is indeed revolutionary, the political questions that it raises—How should information be used? Who should control it? And how should it be preserved?—are ones with which societies have long grappled. These questions attained a particular importance in Europe from the eleventh century due to a technological change no less significant than the ones we are witnessing today: the introduction of paper into Europe. Initially invented in China, paper travelled to Europe via the conduit of Islam around the eleventh century after the Moors conquered Spain. Over the twelfth, thirteenth, and fourteenth centuries, paper emerged as the fundamental substrate which politicians, merchants, and scholars relied on to record and circulate information in governance, commerce, and learning. At the same time, governing institutions sought to preserve and control the spread of written information through the creation of archives: repositories where they collected, organized, and stored documents.

The expansion of European polities overseas from the late fifteenth century onward saw governments massively scale up their use of paper—and confront the challenge of controlling its dissemination across thousands of miles of ocean and land. These pressures were felt particularly acutely in what eventually became the largest empire in world history, the British empire. As people from the British isles from the early seventeenth century fought, traded, and settled their way to power in the Atlantic world and South Asia, administrators faced the problem of how to govern both their emigrating subjects and the non-British peoples with whom they interacted. This meant collecting information about their behavior through the technology of paper. Just as we struggle to organize, search, and control our email boxes, text messages, and app notifications, so too did these early moderns confront the attendant challenges of developing practices of collection and storage to manage the resulting information overload. And despite the best efforts of states and companies to control information, it constantly escaped their grasp, falling into the hands of their opponents and rivals who deployed it to challenge and contest ruling powers.

The history of the early modern information state offers no simple or straightforward answers to the questions that data raises for us today. But it does remind us of a crucial truth, all too readily obscured by the deluge of popular narratives glorifying technological innovation: that questions of data are inherently questions about politics—about who gets to collect, control, and use information, and the ends to which information should be put. We should resist any effort to insulate data governance from democratic processes—and having an informed perspective on the politics of data requires that we attend not just to its present, but also to its past…(More)”.

Even laypeople use legalese


Paper by Eric Martínez, Francis Mollica and Edward Gibson: “Whereas principles of communicative efficiency and legal doctrine dictate that laws be comprehensible to the common world, empirical evidence suggests legal documents are largely incomprehensible to lawyers and laypeople alike. Here, a corpus analysis (n = 59) million words) first replicated and extended prior work revealing laws to contain strikingly higher rates of complex syntactic structures relative to six baseline genres of English. Next, two preregistered text generation experiments (n = 286) tested two leading hypotheses regarding how these complex structures enter into legal documents in the first place. In line with the magic spell hypothesis, we found people tasked with writing official laws wrote in a more convoluted manner than when tasked with writing unofficial legal texts of equivalent conceptual complexity. Contrary to the copy-and-edit hypothesis, we did not find evidence that people editing a legal document wrote in a more convoluted manner than when writing the same document from scratch. From a cognitive perspective, these results suggest law to be a rare exception to the general tendency in human language toward communicative efficiency. In particular, these findings indicate law’s complexity to be derived from its performativity, whereby low-frequency structures may be inserted to signal law’s authoritative, world-state-altering nature, at the cost of increased processing demands on readers. From a law and policy perspective, these results suggest that the tension between the ubiquity and impenetrability of the law is not an inherent one, and that laws can be simplified without a loss or distortion of communicative content…(More)”.

When A.I.’s Output Is a Threat to A.I. Itself


Article by Aatish Bhatia: “The internet is becoming awash in words and images generated by artificial intelligence.

Sam Altman, OpenAI’s chief executive, wrote in February that the company generated about 100 billion words per day — a million novels’ worth of text, every day, an unknown share of which finds its way onto the internet.

A.I.-generated text may show up as a restaurant review, a dating profile or a social media post. And it may show up as a news article, too: NewsGuard, a group that tracks online misinformation, recently identified over a thousand websites that churn out error-prone A.I.-generated news articles.

In reality, with no foolproof methods to detect this kind of content, much will simply remain undetected.

All this A.I.-generated information can make it harder for us to know what’s real. And it also poses a problem for A.I. companies. As they trawl the web for new data to train their next models on — an increasingly challenging task — they’re likely to ingest some of their own A.I.-generated content, creating an unintentional feedback loop in which what was once the output from one A.I. becomes the input for another.

In the long run, this cycle may pose a threat to A.I. itself. Research has shown that when generative A.I. is trained on a lot of its own output, it can get a lot worse.

Here’s a simple illustration of what happens when an A.I. system is trained on its own output, over and over again:

This is part of a data set of 60,000 handwritten digits.

When we trained an A.I. to mimic those digits, its output looked like this.

This new set was made by an A.I. trained on the previous A.I.-generated digits. What happens if this process continues?

After 20 generations of training new A.I.s on their predecessors’ output, the digits blur and start to erode.

After 30 generations, they converge into a single shape.

While this is a simplified example, it illustrates a problem on the horizon.

Imagine a medical-advice chatbot that lists fewer diseases that match your symptoms, because it was trained on a narrower spectrum of medical knowledge generated by previous chatbots. Or an A.I. history tutor that ingests A.I.-generated propaganda and can no longer separate fact from fiction…(More)”.