Use of large language models as a scalable approach to understanding public health discourse


Paper by Laura Espinosa and Marcel Salathé: “Online public health discourse is becoming more and more important in shaping public health dynamics. Large Language Models (LLMs) offer a scalable solution for analysing the vast amounts of unstructured text found on online platforms. Here, we explore the effectiveness of Large Language Models (LLMs), including GPT models and open-source alternatives, for extracting public stances towards vaccination from social media posts. Using an expert-annotated dataset of social media posts related to vaccination, we applied various LLMs and a rule-based sentiment analysis tool to classify the stance towards vaccination. We assessed the accuracy of these methods through comparisons with expert annotations and annotations obtained through crowdsourcing. Our results demonstrate that few-shot prompting of best-in-class LLMs are the best performing methods, and that all alternatives have significant risks of substantial misclassification. The study highlights the potential of LLMs as a scalable tool for public health professionals to quickly gauge public opinion on health policies and interventions, offering an efficient alternative to traditional data analysis methods. With the continuous advancement in LLM development, the integration of these models into public health surveillance systems could substantially improve our ability to monitor and respond to changing public health attitudes…(More)”.

The use of AI for improving energy security


Rand Report: “Electricity systems around the world are under pressure due to aging infrastructure, rising demand for electricity and the need to decarbonise energy supplies at pace. Artificial intelligence (AI) applications have potential to help address these pressures and increase overall energy security. For example, AI applications can reduce peak demand through demand response, improve the efficiency of wind farms and facilitate the integration of large numbers of electric vehicles into the power grid. However, the widespread deployment of AI applications could also come with heightened cybersecurity risks, the risk of unexplained or unexpected actions, or supplier dependency and vendor lock-in. The speed at which AI is developing means many of these opportunities and risks are not yet well understood.

The aim of this study was to provide insight into the state of AI applications for the power grid and the associated risks and opportunities. Researchers conducted a focused scan of the scientific literature to find examples of relevant AI applications in the United States, the European Union, China and the United Kingdom…(More)”.

DIGintegrity


A government‘s toolkit to disrupt corruption through data-based technologies.

Blog by Camilo Cetina: “The Lava Jato corruption scandal exposed a number of Brazilian government officers in 2016, including the then president of the Brazilian Chamber of Representatives, and further investigations have implicated other organisations in a way that reveals a worrying phenomenon worldwide: corruption is mutating into complex forms of organized crime.

For corruption networks to thrive and predate public funds, they need to capture government officers. Furthermore, the progressive digitalization of economies and telecommunications increases the potential of corruption networks to operate transnationally, which makes it easier to identify new cooperation mechanisms (for example, mobilizing illicit cash through a church) and accumulate huge profits thanks to transnational operations. This simultaneously increases their ability to reorganize and hide among huge amounts of data underlying the digital platforms used to mobilize money around the world.

However, at the same time, data-based technologies can significantly contribute as a response to the challenges revealed by recent corruption cases such as Lava Jato, Odebrecht, the Panama Papers or the Pandora Papers. The new report DIGIntegrity, the executive summary of which was recently published by CAF — Development Bank of Latin America, highlights how anti-corruption policies can become more effective when they target specific datasets which then are reused through digital platforms to prevent, detect and investigate corruption networks.

The report explains how the growing digitalization accompanied by the globalization of the economy is having a twofold effect on governments’ integrity agendas. On the one hand, globalization and technology provide unprecedented opportunities for corruption to grow, thus facilitating the concealment of illicit flows of money, and hindering jurisdictional capacities for detection and punishment. But, on the other hand, systemic improvements in governance and collective action are being achieved thanks to new technologies that help provide automated services and make public management processes more visible through open data and increasingly public records. There are “integrity dividends” derived from the growing digitization of governments and the increasingly intensive use of data intelligence to prevent corruption….(More).”

Deep Fake


By Emil Verhulst

/diːp feɪk/

An image or recording that has been convincingly altered and manipulated to misrepresent someone as doing or saying something that was not actually done or said (Merriam-Webster).

The term “deepfake” was first coined in late 2017 by a Reddit user who “shared pornographic videos that used open source face-swapping technology.” Since then, the term has expanded to include a harmful alteration or manipulation of digital media – from audio to landscapes. For example, researchers applied AI techniques to modify aerial imagery, which could potentially lead governments astray or spread false information

“Adversaries may use fake or manipulated information to impact our understanding of the world,” says a spokesperson for the National Geospatial-Intelligence Agency, part of the Pentagon that oversees the collection, analysis, and distribution of geospatial information.”

Audio can also be deepfaked. In 2019, a mysterious case emerged involving a UK-based energy company and its Germany-based parent company. The CEO of the UK energy company received a call from his boss, or at least who he thought was his boss. This “boss” told him to send around 200,000 dollars to a supplier in Hungary:

“The €220,000 was moved to Mexico and channeled to other accounts, and the energy firm—which was not identified—reported the incident to its insurance company, Euler Hermes Group SA. An official with Euler Hermes said the thieves used artificial intelligence to create a deepfake of the German executive’s voice.”

This incident, among others, indicates a rise in crime associated with deep fakes. Deep fakes can alter our perception of reality. In particular, it can prove dangerous in precarious social and political climates, in which false information can incite violence or hate speech online. 

So what is the science behind deep fakes?

Deep fakes are usually created using Generative Adversarial Networks, or GANs. This process is a subfield of AI known as Machine Learning (ML). Machine learning is the use of computer systems that can learn without following instructions, and instead learn using statistics and algorithms to dissect patterns in data. To create a deep fake, two ML algorithms called “neural networks” work in conjunction – one creates fake data (videos, images, audio, etc) that replicates the original data (usually a video or audio from another person), while the other identifies the counterfeit data, competing with the other neural network. The networks compete for iterations of the final product, until there is no difference between the real and fake data. 

Deep fakes will only continue to become more prevalent in the coming years. As they pose a threat to journalism, online speech, and internet safety, we must remain vigilant about our intake of new information online. 

Digital Twins


/ˈdɪʤɪtl twɪnz/

A digital representation of a physical asset which can be used to monitor, visualise, predict and make decisions about it (Open Data Institute).

Digital twin technologies are driven by sensors that collect data in real-time, enabling a digital representation of a physical process or product. Digital twins can help businesses or decision-makers maintain, optimize, or monitor physical assets, providing specific insights into their health and performance. A traffic model, for example, can be used to monitor and manage real-time pedestrian and road traffic in a city. Energy companies such as General Electric and Chevron use digital twins to monitor wind turbines. Digital twins can also help decision-makers at the state and local level better plan infrastructure or monitor city assets. In Sustainable Cities: Big Data, Artificial Intelligence and the Rise of Green, “Cy-phy” Cities, Claudio Scardovi describes how cities can create digital twins, leveraging data and AI, to test strategies for increasing sustainability, inclusivity, and resilience: 

“Global cities are facing an almost unprecedented challenge of change. As they re-emerge from the Covid 19 pandemic and get ready to face climate change and other, potentially existential threats, they need to look for new ways to support wealth and wellbeing creation […] New digital technologies could be used to design digital and physical twins of cities that are able to feed into each other to optimize their working and ability to create new wealth and wellbeing.” 

The UK National Infrastructure Commission created a framework to support the development of digital twins. Similarly, many European countries encourage urban digital twin initiatives:

“Urban digital twins are a virtual representation of a city’s physical assets, using data, data analytics and machine learning to help stimulate models that can be updated and changed (real-time) as their physical equivalents change. [..]  In terms of rationale, they can bring cost efficiencies, operational efficiencies, better crisis management, more openness and better informed decision-making, more participatory governance or better urban planning.”

Sometimes, however, digital twins fail to accurately reflect real-world developments, leading users to make poor decisions. Researchers Fei Tao and Qinglin Qi in “Make more digital twins” describe data challenges in digital twin technologies, such as inconsistencies with data types and scattered ownership:

“Missing or erroneous data can distort results and obscure faults. The wobbling of a wind turbine, say, would be missed if vibration sensors fail. Beijing-based power company BKC Technology struggled to work out that an oil leak was causing a steam turbine to overheat. It turned out that lubricant levels were missing from its digital twin.”

The uptake of digital twins requires both public and private sector collaboration and improved data infrastructures. As the Open Data Institute describes, digital twins depend on a culture of openness: “open data, open culture, open standards and collaborative models that build trust, reduce cost, and create more value.”

Digital Vigilantism


/ˈdɪʤɪtl ˈvɪʤɪləntɪz(ə)m/

A process where citizens are collectively offended by other citizen activity and respond through coordinated retaliation on digital media, including mobile devices and social media platforms (Daniel Trottier, 2017). 

Following the storming of the US Capitol on January 6, 2021, Washington’s Metropolitan Police Department (MPD) released an open call for help identifying rioters. The attack was heavily documented through live stream footage and photos posted to social media; thousands of citizens mobilized to parse through this media to identify and prosecute perpetrators. For example, researchers at the University of Toronto’s Citizen Lab presented photo and video evidence of potential suspects to the FBI, without posting any names publicly. 

This was not the first time citizens have organized to identify individuals involved in a harmful act. In 2015, after the Boston Marathon bombing, members of the public used Reddit and other platforms to conduct a parallel investigation, sharing and searching for information to uncover key information. In both of these cases, many of these amateur investigations had mixed results, with many uninvolved persons shamed and harassed

Acts of digital vigilantism, also referred to as ‘e-vigilantism,’ ‘cyber vigilantism,’ or ‘digilantism’ (Wehmhoener, 2010), are not always directed at matters of national security—these crowdsourced investigations occur as a result of general moral outrage from citizens who seek to distribute justice to groups or individuals they believe have committed an improper act. Often, these allegations can be the result of conspiracy theories, rumors, and a general miasma of distrust. After someone released a video on the internet of a cyclist assaulting two children on a bike trail in 2020, digital vigilantes sought out and subsequently misidentified the perpetrator. Over the coming weeks, this innocent party received threatening messages from angry internet sleuths, who circulated his personal information across social media – including his address.

Digital vigilantism occurs through the sharing of data or information through digital platforms, especially social media. Johnny Nhan, Laura Huey, Ryan Broll, in Digilantism: An Analysis of Crowdsourcing and the Boston Marathon, describe the Reddit community that organized following the Boston Marathon bombing:

“Although some posters focused on technical aspects of the crime in order to identify the perpetrators and understand their motives, others sought a different route. These posters were more interested in discussing whether the attacks were linked to an organized violent extremist group or were instead the work of a so-called ‘lone wolf’ actor. Although different in content from other forms of speculation offered online, these posts similarly were phrased in ways that suggested the poster had some deeper knowledge and/or experience of the field of violent extremism.”

As described above, those partaking in these crowdsourced investigations have a range of motivations—some, well-intentioned and others not. In addition, this crowdsourcing can slow down official investigations by bombarding authorities with unhelpful and false information. 

The fallout from digital vigilantism can also affect targets in a number of ways—from wrongful shaming and harassment online, to death threats lasting several weeks. Daniel Troitter, in the paper “Denunciation and doxing: towards a conceptual model of digital vigilantism,” warns of the social harms caused by digital vigilantism:

“Denunciation may provoke other forms of mediated and embodied activities, including harassment and bullying, threats, and physical violence, often overlapping with gendered persecution and racism. As for longer-term outcomes, researchers can also consider how the reputation and broader social standing of the target and participants are understood and expressed both in news reports as well as accounts by participants […] They may consider references to detrimental life events for targets, for example, an inability to sustain employment, being excommunicated from their community, in addition to physical interventions.”

While many are weary of the illicit behavior that digital vigilantism sanctions, from online harassment to mob organizing leading to physical acts of violence, others acknowledge the collective intelligence practices and their profound impact on societal participation. James Walsh notes:

“[S]uch a transformation in societal participation led to a shift from a deputisation to an autonomization paradigm, referring to the voluntary, or self-appointed, involvement of citizens in the regulatory gatekeeping network. This refers to grassroot mobilisation, rather than governments mobilising the public, with groups of citizens spontaneously aligning themselves with authorities’ aims and objectives.”

Sources and Further Readings:

Techlash


/tɛklæʃ/

A strong and widespread negative reaction to the growing power and influence of large technology companies, particularly those based in Silicon Valley (The Oxford English Dictionary).

Once promised to be society’s great equalizer, communications technology is now facing a backlash due to the role it has in spreading disinformation, privacy breaches, limiting pluralism, and undermining democracy. This phenomenon was predicted by Adrian Woolridge in 2013. In an article at The Economist, Woolridge argued that “the tech elite will join bankers and oilmen in public demonology.”

A 2020 survey conducted by the Knight Foundation and Gallup backs Woolridge’s prediction. The survey finds that there is a significant negative sentiment towards tech companies caused by concerns about how tech companies handle personal data, the spread of misinformation on social media platforms, and their growing influence and power in politics and the general life of their consumers, among other drivers. Some additional findings from the survey include:

  • 74% of Americans are very concerned about the spread of misinformation on the internet.
  • 68% are very concerned about the privacy of personal data stored by internet and technology companies, and 56% are very concerned about hate speech and other abusive or threatening language online.
  • 77% of Americans say major internet and technology companies like Facebook, Google, Amazon, and Apple have too much power.

A majority of tech experts, surveyed by Pew Research, predict “that humans’ use of technology will weaken democracy between now and 2030 due to the speed and scope of reality distortion, the decline of journalism, and the impact of surveillance capitalism.” What underpins this assertion is the argument that:

“The misuse of digital technology to manipulate and weaponize facts affects people’s trust in institutions and each other. That ebbing of trust affects people’s views about whether democratic processes and institutions designed to empower citizens are working.”

However, an overwhelming number of these experts also foresee “significant social and civic innovation between now and 2030 to try to address emerging issues.” One of the conclusions made through this study is that:

“The explosion of data generated by people, gadgetry and environmental sensors will affect the level of social and civic innovation in several potential directions. They argue that the existence of the growing trove of data – and people’s knowledge about its collection – will focus more attention on privacy issues and possibly affect people’s norms and behaviors. In addition, some say the way the data is analyzed will draw more scrutiny of the performance of algorithms and artificial intelligence systems, especially around issues related to whether the outcomes of data use are fair and explainable.”

Many would argue that technology is simply a tool. This tool has the ability to improve lives when utilized the right way. The task at hand is to address these problems in an effective and legitimate manner and to ensure that 21st-century tools deliver on their promise to help society progress.

Data Activism


/ˈdeɪtə ˈæktɪˌvɪzəm/

New social practices enabled by data and technology which aim to create political change (Milan and Gutiérrez).

The large-scale generation of data that has occurred over the past decade has given rise to data activism, defined by Stefania Milan and Miren Gutiérrez, scholars in technology and society at the University of Amsterdam and University of Deusto, as “new social practices rooted in technology and data.” These authors further discuss this term, arguing:

“Data activism indicates social practices that take a critical approach to big data. Examples include the collective mapping and geo-referencing of the messages of victims of natural disasters in order to facilitate disaster relief operations, or the elaboration of open government data for advocacy and campaigning. But data activism also embraces tactics of resistance to massive data collection by private companies and governments, such as the encryption of private communication, or obfuscation tactics that put sand into the data collection machine.

Milan and Gutiérrez further elaborate on these two forms of data activism in their paper “Technopolitics in the Age of Big Data.” Here, they argue all data activism is either proactive or reactive. They state:

“We identify two forms of data activism: proactive data activism, whereby citizens take advantage of the possibilities offered by big data infrastructure for advocacy and social change, and reactive data activism, namely grassroots efforts aimed at resisting massive data collection and protecting users from malicious snooping.”

An example of reactive data activism comes from Media Action Grassroots Network, a network of social justice organizations based in the United States. This network provides digital security training to grassroots activists working on racial justice issues.

An example of proactive data activism is discussed in “Data witnessing: attending to injustice with data in Amnesty International’s Decoders project.” There, author Jonathan Gray, a critical data scholar, examines “what digital data practices at Amnesty International’s Decoders initiative can add to the understanding of witnessing.” According to Gray, witnessing is a concept that has been used in law, religion, and media, among others, to explore the construction of evidence and experience. In this paper, Gray references four data witnessing projects, which are:

“(i) witnessing historical abuses with structured data from digitised documents; (ii) witnessing the destruction of villages with satellite imagery and machine learning; (iii) witnessing environmental injustice with company reports and photographs; and (iv) witnessing online abuse through the classification of Twitter data. These projects illustrate the configuration of experimental apparatuses for witnessing injustices with data.”

Within the more recent context, proactive data activism has several notable examples. Civil rights activists in Zanesville, Ohio used data to demonstrate the inequitable access to clean water between predominantly white communities and black communities. A collection of activists, organizers, and mathematicians formed Data 4 Black Lives to promote justice for Black communities through data and data science. Finally, in an effort to monitor government accountability in providing COVID-19 case data, Indonesian activists created a platform where citizens can independently report COVID-19 cases.

Multisolving


/ˌmʌltiˈsɑlvɪŋ/

The pooling of expertise, funding, and political will to solve multiple problems with a single investment of time and money (Sawin, 2018).

Co-Director of Climate Interactive, a not-for-profit energy and environment think tank, Elizabeth Sawin wrote an article in Stanford Social Innovation Review (SSIR) on multisolving after a year-long study of the implementation of such approach for climate and health. Defined as a way of solving multiple problems with a single investment of time and money, the multisolving approach brings together stakeholders from different sectors and disciplines to tackle public issues in a cost-efficient manner.

In the article, Sawin provides examples of multisolving that have been implemented in countries across the globe:

In Japan, manufacturing facilities use “green curtains”—living panels of climbing plants—to clean the air, provide vegetables for company cafeterias, and reduce energy use for cooling. A walk-to-school program in the United Kingdom fights a decline in childhood physical activity while reducing traffic congestion and greenhouse gas emissions from transportation. A food-gleaning program staffed by young volunteers and families facing food insecurity in Spain addresses food waste, hunger, and a desire for sustainability.

A Climate Interactive report provides three principles and three practices that can help stakeholders develop multisolving strategy. In the SSIR article, Sawin summarizes those principles into three points. First, she argues that a solution must serve everyone in a system without an exception. Second, she suggests that multisolvers must recognize that problems are multifaceted and that multisolving provides solution to multiple facets of a big issue. Third, Sawin posits that experimentation and learning are key to measuring the success of multisolving.

Further in the article, Sawin also outlined three good multisolving practices. First, she identifies openness to collaboration with actors from different sectors or groups in a society as a critical ingredient in developing a multisolving strategy. Second, Sawin stresses the importance of learning, documenting, and improving to ensure optimal benefits of multisolving for the public. Finally, she argues that communicating the benefits of multisolving to various stakeholders can help generate buy-in for a multisolving project.

In concluding the article, Sawin wrote “[n]one of these multisolving principles or tools, on their own, are revolutionary. They need no new apps or state-of-the-art techniques to work. What makes multisolving unique is that it weaves together these principles and practices in a way that builds over time to create big results.”

Informational Autocrats


/ˌɪnfərˈmeɪʃənəl ˈɔtəˌkræts/

Rulers who control and manipulate information in order to maintain power (Guriev and Treisman, 2019).

Sergei Guriev (Professor of Economics, Sciences Po, Paris) and Daniel Treisman (Professor of Political Science, University of California, Los Angeles) detail in their paper, “Informational Autocrats,” a term for new, more surreptitious type of authoritarian leaders. The authors write:

“In this article, we document the changing characteristics of authoritarian states worldwide. Using newly collected data, we show that recent autocrats employ violent repression and impose official ideologies far less often than their predecessors. They also appear more prone to conceal rather than to publicize cases of state brutality. Analyzing texts of leaders’ speeches, we show that “informational autocrats” favor a rhetoric of economic performance and provision of public services that resembles that of democratic leaders far more than it does the discourse of threats and fear embraced by old-style dictators. Authoritarian leaders are increasingly mimicking democracy by holding elections and, where necessary, falsifying the results.

Today, informational autocrats often employ “cyber troops” to spread disinformation. They specifically target and take advantage of the “uninformed masses”  in order to advance their interests. Guriev and Treisman further argue:

“A key element in our theory of informational autocracy is the gap in political knowledge between the “informed elite” and the general public. While the elite accurately observes the limitations of an incompetent incumbent, the public is susceptible to the ruler’s propaganda. Using individual-level data from the Gallup World Poll, we show that such a gap does indeed exist in many authoritarian states today. Unlike in democracies, where the highly educated are more likely than others to approve of their government, in authoritarian states the highly educated tend to be more critical. The highly educated are also more aware of media censorship than their less-schooled compatriots.”

Separately, Andrea Kendall-Taylor, Erica Frantz, and Joseph Wright, in Foreign Affairs, echo the above suggestion, in that: 

“Dictatorships can also use new technologies to shape public perception of the regime and its legitimacy. Automated accounts (or “bots”) on social media can amplify influence campaigns and produce a flurry of distracting or misleading posts that crowd out opponents’ messaging.”

Additionally:

“Digital tools might even help regimes make themselves appear less repressive and more responsive to their citizens. In some cases, authoritarian regimes have deployed new technologies to mimic components of democracy, such as participation and deliberation.”

Globalization of ideas and technological advances have contributed to creating a hostile environment for traditional and overt dictatorship. At the same time, this combination has also been misused by informational autocrats to advance their own interests. Promoting accountability across all sectors through open government data and algorithmic transparency, for example, can prevent such efforts to control and manipulate information.