Training LLMs to Draft Replies to Parliamentary Questions

Blog by Watson Chua: “In Singapore, the government is answerable to Parliament and Members of Parliament (MPs) may raise queries to any Minister on any matter in his portfolio. These questions can be answered orally during the Parliament sitting or through a written reply. Regardless of the medium, public servants in the ministries must gather materials to answer the question and prepare a response.

Generative AI and Large Language Models (LLMs) have already been applied to help public servants do this more effectively and efficiently. For example, Pair Search (publicly accessible) and the Hansard Analysis Tool (only accessible to public servants) help public servants search for relevant information in past Parliamentary Sittings relevant to the question and synthesise a response to it.

The existing systems draft the responses using prompt engineering and Retrieval Augmented Generation (RAG). To recap, RAG consists of two main parts:

  • Retriever: A search engine that finds documents relevant to the question
  • Generator: A text generation model (LLM) that takes in the instruction, the question, and the search results from the retriever to respond to the question
A typical RAG system. Illustration by Hrishi Olickel, taken from here.

Using a pre-trained instruction-tuned LLM like GPT-4o, the generator can usually generate a good response. However, it might not be exactly what is desired in terms of verbosity, style and writing prose, and additional human post-processing might be needed. Extensive prompt engineering or few-shot learning can be done to mold the response at the expense of incurring higher costs from using additional tokens in the prompt…(More)”

Automating public services

Report by Anna Dent: “…Public bodies, under financial stress and looking for effective solutions, are at risk of jumping on the automation bandwagon without critically assessing whether it’s actually appropriate for their needs, and whether the potential benefits outweigh the risks. To realise the benefits of automation and minimise problems for communities and public bodies themselves, a clear-eyed approach which really gets to grips with the risks is needed. 

The temptation to introduce automation to tackle complex social challenges is strong; they are often deep-rooted and expensive to deal with, and can have life-long implications for individuals and communities. But precisely because of their complex nature they are not the best fit for rules-based automated processes, which may fail to deliver what they set out to achieve. 

Bias is increasingly recognised as a critical challenge with automation in the public sector. Bias can be introduced through training data, and can occur when automated tools are disproportionately used on a particular community. In either case, the effectiveness of the tool or process is undermined, and citizens are at risk of discrimination, unfair targeting and exclusion from services. 

Automated tools and processes rely on huge amounts of data; in public services this will often mean personal information and data about us and our lives which we may or may not feel comfortable being used. Balancing everyone’s right to privacy with the desire for efficiency and better outcomes is rarely straightforward, and if done badly can lead to a breakdown in trust…(More)”.

The Tech Coup

Book by Marietje Schaake: “Over the past decades, under the cover of “innovation,” technology companies have successfully resisted regulation and have even begun to seize power from governments themselves. Facial recognition firms track citizens for police surveillance. Cryptocurrency has wiped out the personal savings of millions and threatens the stability of the global financial system. Spyware companies sell digital intelligence tools to anyone who can afford them. This new reality—where unregulated technology has become a forceful instrument for autocrats around the world—is terrible news for democracies and citizens.
In The Tech Coup, Marietje Schaake offers a behind-the-scenes account of how technology companies crept into nearly every corner of our lives and our governments. She takes us beyond the headlines to high-stakes meetings with human rights defenders, business leaders, computer scientists, and politicians to show how technologies—from social media to artificial intelligence—have gone from being heralded as utopian to undermining the pillars of our democracies. To reverse this existential power imbalance, Schaake outlines game-changing solutions to empower elected officials and citizens alike. Democratic leaders can—and must—resist the influence of corporate lobbying and reinvent themselves as dynamic, flexible guardians of our digital world.

Drawing on her experiences in the halls of the European Parliament and among Silicon Valley insiders, Schaake offers a frightening look at our modern tech-obsessed world—and a clear-eyed view of how democracies can build a better future before it is too late…(More)”.

Community consent: neither a ceiling nor a floor

Article by Jasmine McNealy: “The 23andMe breach and the Golden State Killer case are two of the more “flashy” cases, but questions of consent, especially the consent of all of those affected by biodata collection and analysis in more mundane or routine health and medical research projects, are just as important. The communities of people affected have expectations about their privacy and the possible impacts of inferences that could be made about them in data processing systems. Researchers must, then, acquire community consent when attempting to work with networked biodata. 

Several benefits of community consent exist, especially for marginalized and vulnerable populations. These benefits include:

  • Ensuring that information about the research project spreads throughout the community,
  • Removing potential barriers that might be created by resistance from community members,
  • Alleviating the possible concerns of individuals about the perspectives of community leaders, and 
  • Allowing the recruitment of participants using methods most salient to the community.

But community consent does not replace individual consent and limits exist for both community and individual consent. Therefore, within the context of a biorepository, understanding whether community consent might be a ceiling or a floor requires examining governance and autonomy…(More)”.

Digitally Invisible: How the Internet is Creating the New Underclass

Book by Nicol Turner Lee: “President Joe Biden has repeatedly said that the United States would close the digital divide under his leadership. However, the divide still affects people and communities across the country. The complex and persistent reality is that millions of residents live in digital deserts, and many more face disproportionate difficulties when it comes to getting and staying online, especially people of color, seniors, rural residents, and farmers in remote areas.

Economic and health disparities are worsening in rural communities without available internet access. Students living in urban digital deserts with little technology exposure are ill prepared to compete for emerging occupations. Even seniors struggle to navigate the aging process without access to online information and remote care.

In this book, Nicol Turner Lee, a leading expert on the American digital divide, uses personal stories from individuals around the country to show how the emerging digital underclass is navigating the spiraling online economy, while sharing their joys and hopes for an equitable and just future.

Turner Lee argues that achieving digital equity is crucial for the future of America’s global competitiveness and requires radical responses to offset the unintended consequences of increasing digitization. In the end, “Digitally Invisible” proposes a pathway to more equitable access to existing and emerging technologies, while encouraging readers to weigh in on this shared goal…(More)”.

The Data That Powers A.I. Is Disappearing Fast

Article by Kevin Roose: “For years, the people building powerful artificial intelligence systems have used enormous troves of text, images and videos pulled from the internet to train their models.

Now, that data is drying up.

Over the past year, many of the most important web sources used for training A.I. models have restricted the use of their data, according to a study published this week by the Data Provenance Initiative, an M.I.T.-led research group.

The study, which looked at 14,000 web domains that are included in three commonly used A.I. training data sets, discovered an “emerging crisis in consent,” as publishers and online platforms have taken steps to prevent their data from being harvested.

The researchers estimate that in the three data sets — called C4, RefinedWeb and Dolma — 5 percent of all data, and 25 percent of data from the highest-quality sources, has been restricted. Those restrictions are set up through the Robots Exclusion Protocol, a decades-old method for website owners to prevent automated bots from crawling their pages using a file called robots.txt.

The study also found that as much as 45 percent of the data in one set, C4, had been restricted by websites’ terms of service.

“We’re seeing a rapid decline in consent to use data across the web that will have ramifications not just for A.I. companies, but for researchers, academics and noncommercial entities,” said Shayne Longpre, the study’s lead author, in an interview.

Data is the main ingredient in today’s generative A.I. systems, which are fed billions of examples of text, images and videos. Much of that data is scraped from public websites by researchers and compiled in large data sets, which can be downloaded and freely used, or supplemented with data from other sources…(More)”.

The Five Stages Of AI Grief

Essay by Benjamin Bratton: “Alignment” toward “human-centered AI” are just words representing our hopes and fears related to how AI feels like it is out of control — but also to the idea that complex technologies were never under human control to begin with. For reasons more political than perceptive, some insist that “AI” is not even “real,” that it is just math or just an ideological construction of capitalism turning itself into a naturalized fact. Some critics are clearly very angry at the all-too-real prospects of pervasive machine intelligence. Others recognize the reality of AI but are convinced it is something that can be controlled by legislative sessions, policy papers and community workshops. This does not ameliorate the depression felt by still others, who foresee existential catastrophe.

All these reactions may confuse those who see the evolution of machine intelligence, and the artificialization of intelligence itself, as an overdetermined consequence of deeper developments. What to make of these responses?

Sigmund Freud used the term “Copernican” to describe modern decenterings of the human from a place of intuitive privilege. After Nicolaus Copernicus and Charles Darwin, he nominated psychoanalysis as the third such revolution. He also characterized the response to such decenterings as “traumas.”

Trauma brings grief. This is normal. In her 1969 book, “On Death and Dying,” the Swiss psychiatrist Elizabeth Kübler-Ross identified the “five stages of grief”: denial, anger, bargaining, depression and acceptance. Perhaps Copernican Traumas are no different…(More)”.

The Department of Everything

Article by Stephen Akey: “How do you find the life expectancy of a California condor? Google it. Or the gross national product of Morocco? Google it. Or the final resting place of Tom Paine? Google it. There was a time, however—not all that long ago—when you couldn’t Google it or ask Siri or whatever cyber equivalent comes next. You had to do it the hard way—by consulting reference books, indexes, catalogs, almanacs, statistical abstracts, and myriad other printed sources. Or you could save yourself all that time and trouble by taking the easiest available shortcut: You could call me.

From 1984 to 1988, I worked in the Telephone Reference Division of the Brooklyn Public Library. My seven or eight colleagues and I spent the days (and nights) answering exactly such questions. Our callers were as various as New York City itself: copyeditors, fact checkers, game show aspirants, journalists, bill collectors, bet settlers, police detectives, students and teachers, the idly curious, the lonely and loquacious, the park bench crazies, the nervously apprehensive. (This last category comprised many anxious patients about to undergo surgery who called us for background checks on their doctors.) There were telephone reference divisions in libraries all over the country, but this being New York City, we were an unusually large one with an unusually heavy volume of calls. And if I may say so, we were one of the best. More than one caller told me that we were a legend in the world of New York magazine publishing…(More)”.

Precision public health in the era of genomics and big data

Paper by Megan C. Roberts et al: “Precision public health (PPH) considers the interplay between genetics, lifestyle and the environment to improve disease prevention, diagnosis and treatment on a population level—thereby delivering the right interventions to the right populations at the right time. In this Review, we explore the concept of PPH as the next generation of public health. We discuss the historical context of using individual-level data in public health interventions and examine recent advancements in how data from human and pathogen genomics and social, behavioral and environmental research, as well as artificial intelligence, have transformed public health. Real-world examples of PPH are discussed, emphasizing how these approaches are becoming a mainstay in public health, as well as outstanding challenges in their development, implementation and sustainability. Data sciences, ethical, legal and social implications research, capacity building, equity research and implementation science will have a crucial role in realizing the potential for ‘precision’ to enhance traditional public health approaches…(More)”.

Integrating Artificial Intelligence into Citizens’ Assemblies: Benefits, Concerns and Future Pathways

Paper by Sammy McKinney: “Interest in how Artificial Intelligence (AI) could be used within citizens’ assemblies (CAs) is emerging amongst scholars and practitioners alike. In this paper, I make four contributions at the intersection of these burgeoning fields. First, I propose an analytical framework to guide evaluations of the benefits and limitations of AI applications in CAs. Second, I map out eleven ways that AI, especially large language models (LLMs), could be used across a CAs full lifecycle. This introduces novel ideas for AI integration into the literature and synthesises existing proposals to provide the most detailed analytical breakdown of AI applications in CAs to date. Third, drawing on relevant literature, four key informant interviews, and the Global Assembly on the Ecological and Climate crisis as a case study, I apply my analytical framework to assess the desirability of each application. This provides insight into how AI could be deployed to address existing  challenges facing CAs today as well as the concerns that arise with AI integration. Fourth, bringing my analyses together, I argue that AI integration into CAs brings the potential to enhance their democratic quality and institutional capacity, but realising this requires the deliberative community to proceed cautiously, effectively navigate challenging trade-offs, and mitigate important concerns that arise with AI integration. Ultimately, this paper provides a foundation that can guide future research concerning AI integration into CAs and other forms of democratic innovation…(More)”.