AI and Data Science for Public Policy


Introduction to Special Issue by Kenneth Benoit: “Artificial intelligence (AI) and data science are reshaping public policy by enabling more data-driven, predictive, and responsive governance, while at the same time producing profound changes in knowledge production and education in the social and policy sciences. These advancements come with ethical and epistemological challenges surrounding issues of bias, transparency, privacy, and accountability. This special issue explores the opportunities and risks of integrating AI into public policy, offering theoretical frameworks and empirical analyses to help policymakers navigate these complexities. The contributions explore how AI can enhance decision-making in areas such as healthcare, justice, and public services, while emphasising the need for fairness, human judgment, and democratic accountability. The issue provides a roadmap for harnessing AI’s potential responsibly, ensuring it serves the public good and upholds democratic values…(More)”.

Conversational Swarms of Humans and AI Agents enable Hybrid Collaborative Decision-making


Paper by Louis Rosenberg et al: “Conversational Swarm Intelligence (CSI) is an AI-powered communication and collaboration technology that allows large, networked groups (of potentially unlimited size) to hold thoughtful conversational deliberations in real-time. Inspired by the efficient decision-making dynamics of fish schools, CSI divides a human population into a set of small subgroups connected by AI agents. This enables the full group to hold a unified conversation. In this study, groups of 25 participants were tasked with selecting a roster of players in a real Fantasy Baseball contest. A total of 10 trials were run using CSI. In half the trials, each subgroup was augmented with a fact-providing AI agent referred to herein as an Infobot. The Infobot was loaded with a wide range of MLB statistics. The human participants could query the Infobot the same way they would query other persons in their subgroup. Results show that when using CSI, the 25-person groups outperformed 72% of individually surveyed participants and showed significant intelligence amplification versus the mean score (p=0.016). The CSI-enabled groups also significantly outperformed the most popular picks across the collected surveys for each daily contest (p<0.001). The CSI sessions that used Infobots scored slightly higher than those that did not, but it was not statistically significant in this study. That said, 85% of participants agreed with the statement ‘Our decisions were stronger because of information provided by the Infobot’ and only 4% disagreed. In addition, deliberations that used Infobots showed significantly less variance (p=0.039) in conversational content across members. This suggests that Infobots promoted more balanced discussions in which fewer members dominated the dialog. This may be because the infobot enabled participants to confidently express opinions with the support of factual data…(More)”.

Effective Data Stewardship in Higher Education: Skills, Competences, and the Emerging Role of Open Data Stewards


Paper by Panos Fitsilis et al: “The significance of open data in higher education stems from the changing tendencies towards open science, and open research in higher education encourages new ways of making scientific inquiry more transparent, collaborative and accessible. This study focuses on the critical role of open data stewards in this transition, essential for managing and disseminating research data effectively in universities, while it also highlights the increasing demand for structured training and professional policies for data stewards in academic settings. Building upon this context, the paper investigates the essential skills and competences required for effective data stewardship in higher education institutions by elaborating on a critical literature review, coupled with practical engagement in open data stewardship at universities, provided insights into the roles and responsibilities of data stewards. In response to these identified needs, the paper proposes a structured training framework and comprehensive curriculum for data stewardship, a direct response to the gaps identified in the literature. It addresses five key competence categories for open data stewards, aligning them with current trends and essential skills and knowledge in the field. By advocating for a structured approach to data stewardship education, this work sets the foundation for improved data management in universities and serves as a critical step towards professionalizing the role of data stewards in higher education. The emphasis on the role of open data stewards is expected to advance data accessibility and sharing practices, fostering increased transparency, collaboration, and innovation in academic research. This approach contributes to the evolution of universities into open ecosystems, where there is free flow of data for global education and research advancement…(More)”.

Quality Assessment of Volunteered Geographic Information


Paper by Donia Nciri et al: “Traditionally, government and national mapping agencies have been a primary provider of authoritative geospatial information. Today, with the exponential proliferation of Information and Communication Technologies or ICTs (such as GPS, mobile mapping and geo-localized web applications, social media), any user becomes able to produce geospatial information. This participatory production of geographical data gives birth to the concept of Volunteered Geographic Information (VGI). This phenomenon has greatly contributed to the production of huge amounts of heterogeneous data (structured data, textual documents, images, videos, etc.). It has emerged as a potential source of geographic information in many application areas. Despite the various advantages associated with it, this information lacks often quality assurance, since it is provided by diverse user profiles. To address this issue, numerous research studies have been proposed to assess VGI quality in order to help extract relevant content. This work attempts to provide an overall review of VGI quality assessment methods over the last decade. It also investigates varied quality assessment attributes adopted in recent works. Moreover, it presents a classification that forms a basis for future research. Finally, it discusses in detail the relevance and the main limitations of existing approaches and outlines some guidelines for future developments…(More)”.

When combinations of humans and AI are useful: A systematic review and meta-analysis


Paper by Michelle Vaccaro, Abdullah Almaatouq & Thomas Malone: “Inspired by the increasing use of artificial intelligence (AI) to augment humans, researchers have studied human–AI systems involving different tasks, systems and populations. Despite such a large body of work, we lack a broad conceptual understanding of when combinations of humans and AI are better than either alone. Here we addressed this question by conducting a preregistered systematic review and meta-analysis of 106 experimental studies reporting 370 effect sizes. We searched an interdisciplinary set of databases (the Association for Computing Machinery Digital Library, the Web of Science and the Association for Information Systems eLibrary) for studies published between 1 January 2020 and 30 June 2023. Each study was required to include an original human-participants experiment that evaluated the performance of humans alone, AI alone and human–AI combinations. First, we found that, on average, human–AI combinations performed significantly worse than the best of humans or AI alone (Hedges’ g = −0.23; 95% confidence interval, −0.39 to −0.07). Second, we found performance losses in tasks that involved making decisions and significantly greater gains in tasks that involved creating content. Finally, when humans outperformed AI alone, we found performance gains in the combination, but when AI outperformed humans alone, we found losses. Limitations of the evidence assessed here include possible publication bias and variations in the study designs analysed. Overall, these findings highlight the heterogeneity of the effects of human–AI collaboration and point to promising avenues for improving human–AI systems…(More)”.

Open government data and self-efficacy: The empirical evidence of micro foundation via survey experiments


Paper by Kuang-Ting Tai, Pallavi Awasthi, and Ivan P. Lee: “Research on the potential impacts of government openness and open government data is not new. However, empirical evidence regarding the micro-level impact, which can validate macro-level theories, has been particularly limited. Grounded in social cognitive theory, this study contributes to the literature by empirically examining how the dissemination of government information in an open data format can influence individuals’ perceptions of self-efficacy, a key predictor of public participation. Based on two rounds of online survey experiments conducted in the U.S., the findings reveal that exposure to open government data is associated with decreased perceived self-efficacy, resulting in lower confidence in participating in public affairs. This result, while contrary to optimistic assumptions, aligns with some other empirical studies and highlights the need to reconsider the format for disseminating government information. The policy implications suggest further calibration of open data applications to target professional and skilled individuals. This study underscores the importance of experiment replication and theory development as key components of future research agendas…(More)”.

Long-term validation of inner-urban mobility metrics derived from Twitter/X


Paper by Steffen Knoblauch et al: “Urban mobility analysis using Twitter as a proxy has gained significant attention in various application fields; however, long-term validation studies are scarce. This paper addresses this gap by assessing the reliability of Twitter data for modeling inner-urban mobility dynamics over a 27-month period in the. metropolitan area of Rio de Janeiro, Brazil. The evaluation involves the validation of Twitter-derived mobility estimates at both temporal and spatial scales, employing over 1.6 × 1011 mobile phone records of around three million users during the non-stationary mobility period from April 2020 to. June 2022, which coincided with the COVID-19 pandemic. The results highlight the need for caution when using Twitter for short-term modeling of urban mobility flows. Short-term inference can be influenced by Twitter policy changes and the availability of publicly accessible tweets. On the other hand, this long-term study demonstrates that employing multiple mobility metrics simultaneously, analyzing dynamic and static mobility changes concurrently, and employing robust preprocessing techniques such as rolling window downsampling can enhance the inference capabilities of Twitter data. These novel insights gained from a long-term perspective are vital, as Twitter – rebranded to X in 2023 – is extensively used by researchers worldwide to infer human movement patterns. Since conclusions drawn from studies using Twitter could be used to inform public policy, emergency response, and urban planning, evaluating the reliability of this data is of utmost importance…(More)”.

Contractual Freedom and Fairness in EU Data Sharing Agreements


Paper by Thomas Margoni and Alain M. Strowel: “This chapter analyzes the evolving landscape of EU data-sharing agreements, particularly focusing on the balance between contractual freedom and fairness in the context of non-personal data. The discussion highlights the complexities introduced by recent EU legislation, such as the Data Act, Data Governance Act, and Open Data Directive, which collectively aim to regulate data markets and enhance data sharing. The chapter emphasizes how these laws impose obligations that limit contractual freedom to ensure fairness, particularly in business-to-business (B2B) and Internet of Things (IoT) data transactions. It also explores the tension between private ordering and public governance, suggesting that the EU’s approach marks a shift from property-based models to governance-based models in data regulation. This chapter underscores the significant impact these regulations will have on data contracts and the broader EU data economy…(More)”.

AI can help humans find common ground in democratic deliberation


Paper by Michael Henry Tessler et al: “We asked whether an AI system based on large language models (LLMs) could successfully capture the underlying shared perspectives of a group of human discussants by writing a “group statement” that the discussants would collectively endorse. Inspired by Jürgen Habermas’s theory of communicative action, we designed the “Habermas Machine” to iteratively generate group statements that were based on the personal opinions and critiques from individual users, with the goal of maximizing group approval ratings. Through successive rounds of human data collection, we used supervised fine-tuning and reward modeling to progressively enhance the Habermas Machine’s ability to capture shared perspectives. To evaluate the efficacy of AI-mediated deliberation, we conducted a series of experiments with over 5000 participants from the United Kingdom. These experiments investigated the impact of AI mediation on finding common ground, how the views of discussants changed across the process, the balance between minority and majority perspectives in group statements, and potential biases present in those statements. Lastly, we used the Habermas Machine for a virtual citizens’ assembly, assessing its ability to support deliberation on controversial issues within a demographically representative sample of UK residents…(More)”.

Lifecycles, pipelines, and value chains: toward a focus on events in responsible artificial intelligence for health


Paper by Joseph Donia et al: “Process-oriented approaches to the responsible development, implementation, and oversight of artificial intelligence (AI) systems have proliferated in recent years. Variously referred to as lifecycles, pipelines, or value chains, these approaches demonstrate a common focus on systematically mapping key activities and normative considerations throughout the development and use of AI systems. At the same time, these approaches risk focusing on proximal activities of development and use at the expense of a focus on the events and value conflicts that shape how key decisions are made in practice. In this article we report on the results of an ‘embedded’ ethics research study focused on SPOTT– a ‘Smart Physiotherapy Tracking Technology’ employing AI and undergoing development and commercialization at an academic health sciences centre. Through interviews and focus groups with the development and commercialization team, patients, and policy and ethics experts, we suggest that a more expansive design and development lifecycle shaped by key events offers a more robust approach to normative analysis of digital health technologies, especially where those technologies’ actual uses are underspecified or in flux. We introduce five of these key events, outlining their implications for responsible design and governance of AI for health, and present a set of critical questions intended for others doing applied ethics and policy work. We briefly conclude with a reflection on the value of this approach for engaging with health AI ecosystems more broadly…(More)”.