A Fourth Wave of Open Data? Exploring the Spectrum of Scenarios for Open Data and Generative AI

Report by Hannah Chafetz, Sampriti Saxena, and Stefaan G. Verhulst: “Since late 2022, generative AI services and large language models (LLMs) have transformed how many individuals access, and process information. However, how generative AI and LLMs can be augmented with open data from official sources and how open data can be made more accessible with generative AI – potentially enabling a Fourth Wave of Open Data – remains an under explored area. 

For these reasons, The Open Data Policy Lab (a collaboration between The GovLab and Microsoft) decided to explore the possible intersections between open data from official sources and generative AI. Throughout the last year, the team has conducted a range of research initiatives about the potential of open data and generative including a panel discussion, interviews, and Open Data Action Labs – a series of design sprints with a diverse group of industry experts. 

These initiatives were used to inform our latest report, “A Fourth Wave of Open Data? Exploring the Spectrum of Scenarios for Open Data and Generative AI,” (May 2024) which provides a new framework and recommendations to support open data providers and other interested parties in making open data “ready” for generative AI…

The report outlines five scenarios in which open data from official sources (e.g. open government and open research data) and generative AI can intersect. Each of these scenarios includes case studies from the field and a specific set of requirements that open data providers can focus on to become ready for a scenario. These include…(More)” (Arxiv).

Png Cover Page 26

The Wisdom of Partisan Crowds: Comparing Collective Intelligence in Humans and LLM-based Agents

Paper by Yun-Shiuan Chuang et al: “Human groups are able to converge to more accurate beliefs through deliberation, even in the presence of polarization and partisan bias – a phenomenon known as the “wisdom of partisan crowds.” Large Language Models (LLMs) agents are increasingly being used to simulate human collective behavior, yet few benchmarks exist for evaluating their dynamics against the behavior of human groups. In this paper, we examine the extent to which the wisdom of partisan crowds emerges in groups of LLM-based agents that are prompted to role-play as partisan personas (e.g., Democrat or Republican). We find that they not only display human-like partisan biases, but also converge to more accurate beliefs through deliberation, as humans do. We then identify several factors that interfere with convergence, including the use of chain-of-thought prompting and lack of details in personas. Conversely, fine-tuning on human data appears to enhance convergence. These findings show the potential and limitations of LLM-based agents as a model of human collective intelligence…(More)”

Copyright Policy Options for Generative Artificial Intelligence

Paper by Joshua S. Gans: “New generative artificial intelligence (AI) models, including large language models and image generators, have created new challenges for copyright policy as such models may be trained on data that includes copy-protected content. This paper examines this issue from an economics perspective and analyses how different copyright regimes for generative AI will impact the quality of content generated as well as the quality of AI training. A key factor is whether generative AI models are small (with content providers capable of negotiations with AI providers) or large (where negotiations are prohibitive). For small AI models, it is found that giving original content providers copyright protection leads to superior social welfare outcomes compared to having no copyright protection. For large AI models, this comparison is ambiguous and depends on the level of potential harm to original content providers and the importance of content for AI training quality. However, it is demonstrated that an ex-post `fair use’ type mechanism can lead to higher expected social welfare than traditional copyright regimes…(More)”.

Hosting an Online World Café to Develop an Understanding of Digital Health Promoting Settings from a Citizen’s Perspective—Methodological Potentials and Challenges

Paper by Joanna Albrecht: “Brown and Isaacs’ World Café is a participatory research method to make connections to the ideas of others. During the SARS-CoV-2 pandemic and the corresponding contact restrictions, only digital hostings of World Cafés were possible. This article aims to present and reflect on the potentials and challenges of hosting online World Cafés and to derive recommendations for other researchers. Via Zoom and Conceptboard, three online World Cafés were conducted in August 2021. In the World Cafés, the main focus was on the increasing digitization in settings in the context of health promotion and prevention from the perspective of setting members of educational institutions, leisure clubs, and communities. Between 9 and 13 participants participated in three World Cafés. Hosting comprises the phases of design and preparation, realisation, and evaluation. Generally, hosting an online World Café is a suitable method for participatory engagement, but particular challenges have to be overcome. Overall café hosts must create an equal participation environment by ensuring the availability of digital devices and stable internet access. The event schedule must react flexibly to technical disruptions and varying participation numbers. Further, compensatory measures such as support in the form of technical training must be implemented before the event. Finally, due to the higher complexity of digitalisation, roles of participants and staff need to be distributed and coordinated…(More)”.

Behavioural science is unlikely to change the world without a heterogeneity revolution

Article by Christopher J. Bryan, Elizabeth Tipton & David S. Yeager: “In the past decade, behavioural science has gained influence in policymaking but suffered a crisis of confidence in the replicability of its findings. Here, we describe a nascent heterogeneity revolution that we believe these twin historical trends have triggered. This revolution will be defined by the recognition that most treatment effects are heterogeneous, so the variation in effect estimates across studies that defines the replication crisis is to be expected as long as heterogeneous effects are studied without a systematic approach to sampling and moderation. When studied systematically, heterogeneity can be leveraged to build more complete theories of causal mechanism that could inform nuanced and dependable guidance to policymakers. We recommend investment in shared research infrastructure to make it feasible to study behavioural interventions in heterogeneous and generalizable samples, and suggest low-cost steps researchers can take immediately to avoid being misled by heterogeneity and begin to learn from it instead….(More)”.

The Business of City Hall

Paper by Kenneth R. Ahern: “Compared to the federal government, the average citizen in the U.S. has far greater interaction with city governments, including policing, health services, zoning laws, utilities, schooling, and transportation. At the regional level, it is city governments that provide the infrastructure and services that facilitate agglomeration economies in urban areas. However, there is relatively little empirical evidence on the operations of city governments as economic entities. To overcome deficiencies in traditional datasets, this paper amasses a novel, hand-collected dataset on city government finances to describe the functions, expenses, and revenues of the largest 39 cities in the United States from 2003 to 2018. First, city governments are large, with average revenues equivalent to the 78th percentile of U.S. publicly traded firms. Second, cities collect an increasingly large fraction of revenues through direct user fees, rather than taxes. By 2018, total charges for services equal tax revenue in the median city. Third, controlling for city fixed effects, population, and personal income, large city governments shrunk by 15% between 2009 and 2018. Finally, the growth rate of city expenses is more sensitive to population growth, while the growth rate of city revenues is more sensitive to income. These sensitivities lead smaller, poorer cities’ expenses to grow faster than their revenues….(More)”.

Beyond the promise: implementing ethical AI

Ray Eitel-Porter at AI and Ethics: “Artificial Intelligence (AI) applications can and do have unintended negative consequences for businesses if not implemented with care. Specifically, faulty or biased AI applications risk compliance and governance breaches and damage to the corporate brand. These issues commonly arise from a number of pitfalls associated with AI development, which include rushed development, a lack of technical understanding, and improper quality assurance, among other factors. To mitigate these risks, a growing number of organisations are working on ethical AI principles and frameworks. However, ethical AI principles alone are not sufficient for ensuring responsible AI use in enterprises. Businesses also require strong, mandated governance controls including tools for managing processes and creating associated audit trails to enforce their principles. Businesses that implement strong governance frameworks, overseen by an ethics board and strengthened with appropriate training, will reduce the risks associated with AI. When applied to AI modelling, the governance will also make it easier for businesses to bring their AI deployments to scale….(More)”.

Open data governance: civic hacking movement, topics and opinions in digital space

Paper by Mara Maretti, Vanessa Russo & Emiliano del Gobbo: “The expression ‘open data’ relates to a system of informative and freely accessible databases that public administrations make generally available online in order to develop an informative network between institutions, enterprises and citizens. On this topic, using the semantic network analysis method, the research aims to investigate the communication structure and the governance of open data in the Twitter conversational environment. In particular, the research questions are: (1) Who are the main actors in the Italian open data infrastructure? (2) What are the main conversation topics online? (3) What are the pros and cons of the development and use (reuse) of open data in Italy? To answer these questions, we went through three research phases: (1) analysing the communication network, we found who are the main influencers; (2) once we found who were the main actors, we analysed the online content in the Twittersphere to detect the semantic areas; (3) then, through an online focus group with the main open data influencers, we explored the characteristics of Italian open data governance. Through the research, it has been shown that: (1) there is an Italian open data governance strategy; (2) the Italian civic hacker community plays an important role as an influencer; but (3) there are weaknesses in governance and in practical reuse….(More)”.

How Data Can Help in the Fight Against the Opioid Epidemic in the United States

Report by Joshua New: “The United States is in the midst of an opioid epidemic 20 years in the making….

One of the most pernicious obstacles in the fight against the opioid epidemic is that, until relatively recently, it was difficult to measure the epidemic in any comprehensive capacity beyond such high-level statistics. A lack of granular data and authorities’ inability to use data to inform response efforts allowed the epidemic to grow to devastating proportions. The maxim “you can’t manage what you can’t measure” has never been so relevant, and this failure to effectively leverage data has undoubtedly cost many lives and caused severe social and economic damage to communities ravaged by opioid addiction, with authorities limited in their ability to fight back.

Many factors contributed to the opioid epidemic, including healthcare providers not fully understanding the potential ramifications of prescribing opioids, socioeconomic conditions that make addiction more likely, and drug distributors turning a blind eye to likely criminal behavior, such as pharmacy workers illegally selling opioids on the black market. Data will not be able to solve these problems, but it can make public health officials and other stakeholders more effective at responding to them. Fortunately, recent efforts to better leverage data in the fight against the opioid epidemic have demonstrated the potential for data to be an invaluable and effective tool to inform decision-making and guide response efforts. Policymakers should aggressively pursue more data-driven strategies to combat the opioid epidemic while learning from past mistakes that helped contribute to the epidemic to prevent similar situations in the future.

The scope of this paper is limited to opportunities to better leverage data to help address problems primarily related to the abuse of prescription opioids, rather than the abuse of illicitly manufactured opioids such as heroin and fentanyl. While these issues may overlap, such as when a person develops an opioid use disorder from prescribed opioids and then seeks heroin when they are unable to obtain more from their doctor, the opportunities to address the abuse of prescription opioids are more clear-cut….(More)”.

Finland’s model in utilising forest data

Report by Matti Valonen et al: “The aim of this study is to depict the Finnish Forest Centre’s Metsään.fiwebsite’s background, objectives and implementation and to assess its needs for development and future prospects. The Metsään.fi-service included in the Metsään.fi-website is a free e-service for forest owners and corporate actors (companies, associations and service providers) in the forest sector, which aim is to support active decision-making among forest owners by offering forest resource data and maps on forest properties, by making contacts with the authorities easier through online services and to act as a platform for offering forest services, among other things.

In addition to the Metsään.fi-service, the website includes open forest data services that offer the users national forest resource data that is not linked with personal information.

Private forests are in a key position as raw material sources for traditional and new forest-based bioeconomy. In addition to wood material, the forests produce non-timber forest products (for example berries and mushrooms), opportunities for recreation and other ecosystem services.

Private forests cover roughly 60 percent of forest land, but about 80 percent of the domestic wood used by forest industry. In 2017 the value of the forest industry production was 21 billion euros, which is a fifth of the entire industry production value in Finland. The forest industry export in 2017 was worth about 12 billion euros, which covers a fifth of the entire export of goods. Therefore, the forest sector is important for Finland’s national economy…(More)”.