AI chatbots do work of civil servants in productivity trial


Article by Paul Seddon: “Documents disclosed to the BBC have shed light on the use of AI-powered chatbot technology within government.

The chatbots have been used to analyse lengthy reports – a job that would normally be done by humans.

The Department for Education, which ran the trial, hopes it could boost productivity across Whitehall.

The PCS civil service union says it does not object to the use of AI – but clear guidelines are needed “so the benefits are shared by workers”.

The latest generation of chatbots, powered by artificial intelligence (AI), can quickly analyse reams of information, including images, to answer questions and summarise long articles.

They are expected to upend working practices across the economy in the coming years, and the government says they will have “significant implications” for the way officials work in future.

The education department ran the eight-week study over the summer under a contract with London-based company Faculty.ai, to test how so-called large language models (LLMs) could be used by officials.

The firm’s researchers used its access to a premium version of ChatGPT, the popular chatbot developed by OpenAI, to analyse draft local skills training plans that had been sent to the department to review.

These plans, drawn up by bodies representing local employers, are meant to influence the training offered by local further education colleges.

Results from the pilot are yet to be published, but documents and emails requested by the BBC under Freedom of Information laws offer an insight into the project’s aims.

According to an internal document setting out the reasons for the study, a chatbot would be used to summarise and compare the “main insights and themes” from the training plans.

The results, which were to be compared with summaries produced by civil servants, would test how Civil Service “productivity” might be improved.

It added that language models could analyse long, unstructured documents “where previously the only other option for be for individuals to read through all the reports”.

But the project’s aims went further, with hopes the chatbot could help provide “useful insights” that could help the department’s skills unit “identify future skills needs across the country”…(More)”.

The Participatory Turn in AI Design: Theoretical Foundations and the Current State of Practice


Paper by Fernando Delgado, Stephen Yang, Michael Madaio, and Qian Yang: “Despite the growing consensus that stakeholders affected by AI systems should participate in their design, enormous variation and implicit disagreements exist among current approaches. For researchers and practitioners who are interested in taking a participatory approach to AI design and development, it remains challenging to assess the extent to which any participatory approach grants substantive agency to stakeholders. This article thus aims to ground what we dub the “participatory turn” in AI design by synthesizing existing theoretical literature on participation and through empirical investigation and critique of its current practices. Specifically, we derive a conceptual framework through synthesis of literature across technology design, political theory, and the social sciences that researchers and practitioners can leverage to evaluate approaches to participation in AI design. Additionally, we articulate empirical findings concerning the current state of participatory practice in AI design based on an analysis of recently published research and semi-structured interviews with 12 AI researchers and practitioners. We use these empirical findings to understand the current state of participatory practice and subsequently provide guidance to better align participatory goals and methods in a way that accounts for practical constraints…(More)”.

Data Dysphoria: The Governance Challenge Posed by Large Learning Models


Paper by Susan Ariel Aaronson: “Only 8 months have passed since Chat-GPT and the large learning model underpinning it took the world by storm. This article focuses on the data supply chain—the data collected and then utilized to train large language models and the governance challenge it presents to policymakers These challenges include:

• How web scraping may affect individuals and firms which hold copyrights.
• How web scraping may affect individuals and groups who are supposed to be protected under privacy and personal data protection laws.
• How web scraping revealed the lack of protections for content creators and content providers on open access web sites; and
• How the debate over open and closed source LLM reveals the lack of clear and universal rules to ensure the quality and validity of datasets. As the US National Institute of Standards explained, many LLMs depend on “largescale datasets, which can lead to data quality and validity concerns. “The difficulty of finding the “right” data may lead AI actors to select datasets based more on accessibility and availability than on suitability… Such decisions could contribute to an environment where the data used in processes is not fully representative of the populations or phenomena that are being modeled, introducing downstream risks” –in short problems of quality and validity…(More)”.

International Definitions of Artificial Intelligence


Report by IAPP: “Computer scientist John McCarthy coined the term artificial intelligence in 1955, defining it as “the science and engineering of making intelligent machines.” He organized the Dartmouth Summer Research Project on Artificial Intelligence a year later — an event that many consider the birthplace of the field.

In today’s world, the definition of AI has been in continuous evolution, its contours and constraints changing to align with current and perhaps future technological progress and cultural contexts. In fact, most papers and articles are quick to point out the lack of common consensus around the definition of AI. As a resource from British research organization the Ada Lovelace Institute states, “We recognise that the terminology in this area is contested. This is a fast-moving topic, and we expect that terminology will evolve quickly.” The difficulty in defining AI is illustrated by what AI historian Pamela McCorduck called the “odd paradox,” referring to the idea that, as computer scientists find new and innovative solutions, computational techniques once considered AI lose the title as they become common and repetitive.

The indeterminate nature of the term poses particular challenges in the regulatory space. Indeed, in 2017 a New York City Council task force downgraded its mission to regulate the city’s use of automated decision-making systems to just defining the types of systems subject to regulation because it could not agree on a workable, legal definition of AI.

With this understanding, the following chart provides a snapshot of some of the definitions of AI from various global and sectoral (government, civil society and industry) perspectives. The chart is not an exhaustive list. It allows for cross-contextual comparisons from key players in the AI ecosystem…(More)”

Demographic Parity: Mitigating Biases in Real-World Data


Paper by Orestis Loukas, and Ho-Ryun Chung: “Computer-based decision systems are widely used to automate decisions in many aspects of everyday life, which include sensitive areas like hiring, loaning and even criminal sentencing. A decision pipeline heavily relies on large volumes of historical real-world data for training its models. However, historical training data often contains gender, racial or other biases which are propagated to the trained models influencing computer-based decisions. In this work, we propose a robust methodology that guarantees the removal of unwanted biases while maximally preserving classification utility. Our approach can always achieve this in a model-independent way by deriving from real-world data the asymptotic dataset that uniquely encodes demographic parity and realism. As a proof-of-principle, we deduce from public census records such an asymptotic dataset from which synthetic samples can be generated to train well-established classifiers. Benchmarking the generalization capability of these classifiers trained on our synthetic data, we confirm the absence of any explicit or implicit bias in the computer-aided decision…(More)”.

Machine-assisted mixed methods: augmenting humanities and social sciences with artificial intelligence


Paper by Andres Karjus: “The increasing capacities of large language models (LLMs) present an unprecedented opportunity to scale up data analytics in the humanities and social sciences, augmenting and automating qualitative analytic tasks previously typically allocated to human labor. This contribution proposes a systematic mixed methods framework to harness qualitative analytic expertise, machine scalability, and rigorous quantification, with attention to transparency and replicability. 16 machine-assisted case studies are showcased as proof of concept. Tasks include linguistic and discourse analysis, lexical semantic change detection, interview analysis, historical event cause inference and text mining, detection of political stance, text and idea reuse, genre composition in literature and film; social network inference, automated lexicography, missing metadata augmentation, and multimodal visual cultural analytics. In contrast to the focus on English in the emerging LLM applicability literature, many examples here deal with scenarios involving smaller languages and historical texts prone to digitization distortions. In all but the most difficult tasks requiring expert knowledge, generative LLMs can demonstrably serve as viable research instruments. LLM (and human) annotations may contain errors and variation, but the agreement rate can and should be accounted for in subsequent statistical modeling; a bootstrapping approach is discussed. The replications among the case studies illustrate how tasks previously requiring potentially months of team effort and complex computational pipelines, can now be accomplished by an LLM-assisted scholar in a fraction of the time. Importantly, this approach is not intended to replace, but to augment researcher knowledge and skills. With these opportunities in sight, qualitative expertise and the ability to pose insightful questions have arguably never been more critical…(More)”.

Missing Persons: The Case of National AI Strategies


Article by Susan Ariel Aaronson and Adam Zable: “Policy makers should inform, consult and involve citizens as part of their efforts to data-driven technologies such as artificial intelligence (AI). Although many users rely on AI systems, they do not understand how these systems use their data to make predictions and recommendations that can affect their daily lives. Over time, if they see their data being misused, users may learn to distrust both the systems and how policy makers regulate them. This paper examines whether officials informed and consulted their citizens as they developed a key aspect of AI policy — national AI strategies. Building on a data set of 68 countries and the European Union, the authors used qualitative methods to examine whether, how and when governments engaged with their citizens on their AI strategies and whether they were responsive to public comment, concluding that policy makers are missing an opportunity to build trust in AI by not using this process to involve a broader cross-section of their constituents…(More)”.

These Prisoners Are Training AI


Article by Morgan Meaker: “…Around the world, millions of so-called “clickworkers” train artificial intelligence models, teaching machines the difference between pedestrians and palm trees, or what combination of words describe violence or sexual abuse. Usually these workers are stationed in the global south, where wages are cheap. OpenAI, for example, uses an outsourcing firm that employs clickworkers in Kenya, Uganda, and India. That arrangement works for American companies, operating in the world’s most widely spoken language, English. But there are not a lot of people in the global south who speak Finnish.

That’s why Metroc turned to prison labor. The company gets cheap, Finnish-speaking workers, while the prison system can offer inmates employment that, it says, prepares them for the digital world of work after their release. Using prisoners to train AI creates uneasy parallels with the kind of low-paid and sometimes exploitive labor that has often existed downstream in technology. But in Finland, the project has received widespread support.

“There’s this global idea of what data labor is. And then there’s what happens in Finland, which is very different if you look at it closely,” says Tuukka Lehtiniemi, a researcher at the University of Helsinki, who has been studying data labor in Finnish prisons.

For four months, Marmalade has lived here, in Hämeenlinna prison. The building is modern, with big windows. Colorful artwork tries to enforce a sense of cheeriness on otherwise empty corridors. If it wasn’t for the heavy gray security doors blocking every entry and exit, these rooms could easily belong to a particularly soulless school or university complex.

Finland might be famous for its open prisons—where inmates can work or study in nearby towns—but this is not one of them. Instead, Hämeenlinna is the country’s highest-security institution housing exclusively female inmates. Marmalade has been sentenced to six years. Under privacy rules set by the prison, WIRED is not able to publish Marmalade’s real name, exact age, or any other information that could be used to identify her. But in a country where prisoners serving life terms can apply to be released after 12 years, six years is a heavy sentence. And like the other 100 inmates who live here, she is not allowed to leave…(More)”.

Initial policy considerations for generative artificial intelligence


OECD Report: “Generative artificial intelligence (AI) creates new content in response to prompts, offering transformative potential across multiple sectors such as education, entertainment, healthcare and scientific research. However, these technologies also pose critical societal and policy challenges that policy makers must confront: potential shifts in labour markets, copyright uncertainties, and risk associated with the perpetuation of societal biases and the potential for misuse in the creation of disinformation and manipulated content. Consequences could extend to the spreading of mis- and disinformation, perpetuation of discrimination, distortion of public discourse and markets, and the incitement of violence. Governments recognise the transformative impact of generative AI and are actively working to address these challenges. This paper aims to inform these policy considerations and support decision makers in addressing them…(More)”.

Navigating the Jagged Technological Frontier: Field Experimental Evidence of the Effects of AI on Knowledge Worker Productivity and Quality


Paper by Fabrizio Dell’Acqua et al: “The public release of Large Language Models (LLMs) has sparked tremendous interest in how humans will use Artificial Intelligence (AI) to accomplish a variety of tasks. In our study conducted with Boston Consulting Group, a global management consulting firm, we examine the performance implications of AI on realistic, complex, and knowledge-intensive tasks. The pre-registered experiment involved 758 consultants comprising about 7% of the individual contributor-level consultants at the company. After establishing a performance baseline on a similar task, subjects were randomly assigned to one of three conditions: no AI access, GPT-4 AI access, or GPT-4 AI access with a prompt engineering overview. We suggest that the capabilities of AI create a “jagged technological frontier” where some tasks are easily done by AI, while others, though seemingly similar in difficulty level, are outside the current capability of AI. For each one of a set of 18 realistic consulting tasks within the frontier of AI capabilities, consultants using AI were significantly more productive (they completed 12.2% more tasks on average, and completed task 25.1% more quickly), and produced significantly higher quality results (more than 40% higher quality compared to a control group). Consultants across the skills distribution benefited significantly from having AI augmentation, with those below the average performance threshold increasing by 43% and those above increasing by 17% compared to their own scores. For a task selected to be outside the frontier, however, consultants using AI were 19 percentage points less likely to produce correct solutions compared to those without AI. Further, our analysis shows the emergence of two distinctive patterns of successful AI use by humans along a spectrum of human-AI integration. One set of consultants acted as “Centaurs,” like the mythical halfhorse/half-human creature, dividing and delegating their solution-creation activities to the AI or to themselves. Another set of consultants acted more like “Cyborgs,” completely integrating their task flow with the AI and continually interacting with the technology…(More)”.