The Risks of Empowering “Citizen Data Scientists”


Article by Reid Blackman and Tamara Sipes: “Until recently, the prevailing understanding of artificial intelligence (AI) and its subset machine learning (ML) was that expert data scientists and AI engineers were the only people that could push AI strategy and implementation forward. That was a reasonable view. After all, data science generally, and AI in particular, is a technical field requiring, among other things, expertise that requires many years of education and training to obtain.

Fast forward to today, however, and the conventional wisdom is rapidly changing. The advent of “auto-ML” — software that provides methods and processes for creating machine learning code — has led to calls to “democratize” data science and AI. The idea is that these tools enable organizations to invite and leverage non-data scientists — say, domain data experts, team members very familiar with the business processes, or heads of various business units — to propel their AI efforts.

In theory, making data science and AI more accessible to non-data scientists (including technologists who are not data scientists) can make a lot of business sense. Centralized and siloed data science units can fail to appreciate the vast array of data the organization has and the business problems that it can solve, particularly with multinational organizations with hundreds or thousands of business units distributed across several continents. Moreover, those in the weeds of business units know the data they have, the problems they’re trying to solve, and can, with training, see how that data can be leveraged to solve those problems. The opportunities are significant.

In short, with great business insight, augmented with auto-ML, can come great analytic responsibility. At the same time, we cannot forget that data science and AI are, in fact, very difficult, and there’s a very long journey from having data to solving a problem. In this article, we’ll lay out the pros and cons of integrating citizen data scientists into your AI strategy and suggest methods for optimizing success and minimizing risks…(More)”.

Anonymization: The imperfect science of using data while preserving privacy


Paper by Andrea Gadotti et al: “Information about us, our actions, and our preferences is created at scale through surveys or scientific studies or as a result of our interaction with digital devices such as smartphones and fitness trackers. The ability to safely share and analyze such data is key for scientific and societal progress. Anonymization is considered by scientists and policy-makers as one of the main ways to share data while minimizing privacy risks. In this review, we offer a pragmatic perspective on the modern literature on privacy attacks and anonymization techniques. We discuss traditional de-identification techniques and their strong limitations in the age of big data. We then turn our attention to modern approaches to share anonymous aggregate data, such as data query systems, synthetic data, and differential privacy. We find that, although no perfect solution exists, applying modern techniques while auditing their guarantees against attacks is the best approach to safely use and share data today…(More)”.

Policy fit for the future: the Australian Government Futures primer


Primer by Will Hartigan and Arthur Horobin: “Futures is a systematic exploration of probable, possible and preferable future developments to inform present-day policy, strategy and decision-making. It uses multiple plausible scenarios of the future to anticipate and make sense of disruptive change. It is also known as strategic foresight...

This primer provides an overview of Futures methodologies and their practical application to policy development and advice. It is a first step for policy teams and officers interested in Futures: providing you with a range of flexible tools, ideas and advice you can adapt to your own policy challenges and environments.

This primer was developed by the Policy Projects and Taskforce Office in the Department of Prime Minister and Cabinet. We have drawn on expertise from inside and outside of government –including through our project partners, the Futures Hub at the National Security College in the Australian National University. 

This primer has been written by policy officers, for policy officers –with a focus on practical and tested approaches that can support you to create policy fit for the future…(More)”.

Training LLMs to Draft Replies to Parliamentary Questions


Blog by Watson Chua: “In Singapore, the government is answerable to Parliament and Members of Parliament (MPs) may raise queries to any Minister on any matter in his portfolio. These questions can be answered orally during the Parliament sitting or through a written reply. Regardless of the medium, public servants in the ministries must gather materials to answer the question and prepare a response.

Generative AI and Large Language Models (LLMs) have already been applied to help public servants do this more effectively and efficiently. For example, Pair Search (publicly accessible) and the Hansard Analysis Tool (only accessible to public servants) help public servants search for relevant information in past Parliamentary Sittings relevant to the question and synthesise a response to it.

The existing systems draft the responses using prompt engineering and Retrieval Augmented Generation (RAG). To recap, RAG consists of two main parts:

  • Retriever: A search engine that finds documents relevant to the question
  • Generator: A text generation model (LLM) that takes in the instruction, the question, and the search results from the retriever to respond to the question
A typical RAG system. Illustration by Hrishi Olickel, taken from here.

Using a pre-trained instruction-tuned LLM like GPT-4o, the generator can usually generate a good response. However, it might not be exactly what is desired in terms of verbosity, style and writing prose, and additional human post-processing might be needed. Extensive prompt engineering or few-shot learning can be done to mold the response at the expense of incurring higher costs from using additional tokens in the prompt…(More)”

Automating public services


Report by Anna Dent: “…Public bodies, under financial stress and looking for effective solutions, are at risk of jumping on the automation bandwagon without critically assessing whether it’s actually appropriate for their needs, and whether the potential benefits outweigh the risks. To realise the benefits of automation and minimise problems for communities and public bodies themselves, a clear-eyed approach which really gets to grips with the risks is needed. 

The temptation to introduce automation to tackle complex social challenges is strong; they are often deep-rooted and expensive to deal with, and can have life-long implications for individuals and communities. But precisely because of their complex nature they are not the best fit for rules-based automated processes, which may fail to deliver what they set out to achieve. 

Bias is increasingly recognised as a critical challenge with automation in the public sector. Bias can be introduced through training data, and can occur when automated tools are disproportionately used on a particular community. In either case, the effectiveness of the tool or process is undermined, and citizens are at risk of discrimination, unfair targeting and exclusion from services. 

Automated tools and processes rely on huge amounts of data; in public services this will often mean personal information and data about us and our lives which we may or may not feel comfortable being used. Balancing everyone’s right to privacy with the desire for efficiency and better outcomes is rarely straightforward, and if done badly can lead to a breakdown in trust…(More)”.

The double-edged sword of AI in education


Article by Rose Luckin: “Artificial intelligence (AI) could revolutionize education as profoundly as the internet has already revolutionized our lives. However, our experience with commercial internet platforms gives us pause. Consider how social media algorithms, designed to maximize engagement and ad revenue, have inadvertently promoted divisive content and misinformation, a development at odds with educational goals.

Like the commercialization of the internet, the AI consumerization trend, driven by massive investments across sectors, prioritizes profit over societal and educational benefits. This focus on monetization risks overshadowing crucial considerations about AI’s integration into educational contexts.

The consumerization of AI in education is a double-edged sword. While increasing accessibility, it could also undermine fundamental educational principles and reshape students’ attitudes toward learning. We must advocate for a thoughtful, education-centric approach to AI development that enhances, rather than replaces, human intelligence and recognises the value of effort in learning.

As generative AI systems for education emerge, technical experts and policymakers have a unique opportunity to ensure their design supports the interests of learners and educators.

Risk 1: Overestimating AI’s intelligence

In essence, learning is not merely an individual cognitive process but a deeply social endeavor, intricately linked to cultural context, language development, and the dynamic relationship between practical experience and theoretical knowledge…(More)”.

The Tech Coup


Book by Marietje Schaake: “Over the past decades, under the cover of “innovation,” technology companies have successfully resisted regulation and have even begun to seize power from governments themselves. Facial recognition firms track citizens for police surveillance. Cryptocurrency has wiped out the personal savings of millions and threatens the stability of the global financial system. Spyware companies sell digital intelligence tools to anyone who can afford them. This new reality—where unregulated technology has become a forceful instrument for autocrats around the world—is terrible news for democracies and citizens.
In The Tech Coup, Marietje Schaake offers a behind-the-scenes account of how technology companies crept into nearly every corner of our lives and our governments. She takes us beyond the headlines to high-stakes meetings with human rights defenders, business leaders, computer scientists, and politicians to show how technologies—from social media to artificial intelligence—have gone from being heralded as utopian to undermining the pillars of our democracies. To reverse this existential power imbalance, Schaake outlines game-changing solutions to empower elected officials and citizens alike. Democratic leaders can—and must—resist the influence of corporate lobbying and reinvent themselves as dynamic, flexible guardians of our digital world.

Drawing on her experiences in the halls of the European Parliament and among Silicon Valley insiders, Schaake offers a frightening look at our modern tech-obsessed world—and a clear-eyed view of how democracies can build a better future before it is too late…(More)”.

AI mass surveillance at Paris Olympics


Article by Anne Toomey McKenna: “The 2024 Paris Olympics is drawing the eyes of the world as thousands of athletes and support personnel and hundreds of thousands of visitors from around the globe converge in France. It’s not just the eyes of the world that will be watching. Artificial intelligence systems will be watching, too.

Government and private companies will be using advanced AI tools and other surveillance tech to conduct pervasive and persistent surveillance before, during and after the Games. The Olympic world stage and international crowds pose increased security risks so significant that in recent years authorities and critics have described the Olympics as the “world’s largest security operations outside of war.”

The French government, hand in hand with the private tech sector, has harnessed that legitimate need for increased security as grounds to deploy technologically advanced surveillance and data gathering tools. Its surveillance plans to meet those risks, including controversial use of experimental AI video surveillance, are so extensive that the country had to change its laws to make the planned surveillance legal.

The plan goes beyond new AI video surveillance systems. According to news reports, the prime minister’s office has negotiated a provisional decree that is classified to permit the government to significantly ramp up traditional, surreptitious surveillance and information gathering tools for the duration of the Games. These include wiretapping; collecting geolocation, communications and computer data; and capturing greater amounts of visual and audio data…(More)”.

The impact of data portability on user empowerment, innovation, and competition


OECD Note: “Data portability enhances access to and sharing of data across digital services and platforms. It can empower users to play a more active role in the re-use of their data and can help stimulate competition and innovation by fostering interoperability while reducing switching costs and lock-in effects. However, the effectiveness of data portability in enhancing competition depends on the terms and conditions of data transfer and the extent to which competitors can make use of the data effectively. Additionally, there are potential downsides: data portability measures may unintentionally stifle competition in fast-evolving markets where interoperability requirements may disproportionately burden SMEs and start-ups. Data portability can also increase digital security and privacy risks by enabling data transfers to multiple destinations. This note presents the following five dimensions essential for designing and implementing data portability frameworks: sectoral scope; beneficiaries; type of data; legal obligations; and operational modality…(More)”.

Community consent: neither a ceiling nor a floor


Article by Jasmine McNealy: “The 23andMe breach and the Golden State Killer case are two of the more “flashy” cases, but questions of consent, especially the consent of all of those affected by biodata collection and analysis in more mundane or routine health and medical research projects, are just as important. The communities of people affected have expectations about their privacy and the possible impacts of inferences that could be made about them in data processing systems. Researchers must, then, acquire community consent when attempting to work with networked biodata. 

Several benefits of community consent exist, especially for marginalized and vulnerable populations. These benefits include:

  • Ensuring that information about the research project spreads throughout the community,
  • Removing potential barriers that might be created by resistance from community members,
  • Alleviating the possible concerns of individuals about the perspectives of community leaders, and 
  • Allowing the recruitment of participants using methods most salient to the community.

But community consent does not replace individual consent and limits exist for both community and individual consent. Therefore, within the context of a biorepository, understanding whether community consent might be a ceiling or a floor requires examining governance and autonomy…(More)”.