Precision public health in the era of genomics and big data


Paper by Megan C. Roberts et al: “Precision public health (PPH) considers the interplay between genetics, lifestyle and the environment to improve disease prevention, diagnosis and treatment on a population level—thereby delivering the right interventions to the right populations at the right time. In this Review, we explore the concept of PPH as the next generation of public health. We discuss the historical context of using individual-level data in public health interventions and examine recent advancements in how data from human and pathogen genomics and social, behavioral and environmental research, as well as artificial intelligence, have transformed public health. Real-world examples of PPH are discussed, emphasizing how these approaches are becoming a mainstay in public health, as well as outstanding challenges in their development, implementation and sustainability. Data sciences, ethical, legal and social implications research, capacity building, equity research and implementation science will have a crucial role in realizing the potential for ‘precision’ to enhance traditional public health approaches…(More)”.

Integrating Artificial Intelligence into Citizens’ Assemblies: Benefits, Concerns and Future Pathways


Paper by Sammy McKinney: “Interest in how Artificial Intelligence (AI) could be used within citizens’ assemblies (CAs) is emerging amongst scholars and practitioners alike. In this paper, I make four contributions at the intersection of these burgeoning fields. First, I propose an analytical framework to guide evaluations of the benefits and limitations of AI applications in CAs. Second, I map out eleven ways that AI, especially large language models (LLMs), could be used across a CAs full lifecycle. This introduces novel ideas for AI integration into the literature and synthesises existing proposals to provide the most detailed analytical breakdown of AI applications in CAs to date. Third, drawing on relevant literature, four key informant interviews, and the Global Assembly on the Ecological and Climate crisis as a case study, I apply my analytical framework to assess the desirability of each application. This provides insight into how AI could be deployed to address existing  challenges facing CAs today as well as the concerns that arise with AI integration. Fourth, bringing my analyses together, I argue that AI integration into CAs brings the potential to enhance their democratic quality and institutional capacity, but realising this requires the deliberative community to proceed cautiously, effectively navigate challenging trade-offs, and mitigate important concerns that arise with AI integration. Ultimately, this paper provides a foundation that can guide future research concerning AI integration into CAs and other forms of democratic innovation…(More)”.

The Great Scrape: The Clash Between Scraping and Privacy


Paper by Daniel J. Solove and Woodrow Hartzog: “Artificial intelligence (AI) systems depend on massive quantities of data, often gathered by “scraping” – the automated extraction of large amounts of data from the internet. A great deal of scraped data is about people. This personal data provides the grist for AI tools such as facial recognition, deep fakes, and generative AI. Although scraping enables web searching, archival, and meaningful scientific research, scraping for AI can also be objectionable or even harmful to individuals and society.

Organizations are scraping at an escalating pace and scale, even though many privacy laws are seemingly incongruous with the practice. In this Article, we contend that scraping must undergo a serious reckoning with privacy law.  Scraping violates nearly all of the key principles in privacy laws, including fairness; individual rights and control; transparency; consent; purpose specification and secondary use restrictions; data minimization; onward transfer; and data security. With scraping, data protection laws built around these requirements are ignored.

Scraping has evaded a reckoning with privacy law largely because scrapers act as if all publicly available data were free for the taking. But the public availability of scraped data shouldn’t give scrapers a free pass. Privacy law regularly protects publicly available data, and privacy principles are implicated even when personal data is accessible to others.

This Article explores the fundamental tension between scraping and privacy law. With the zealous pursuit and astronomical growth of AI, we are in the midst of what we call the “great scrape.” There must now be a great reconciliation…(More)”.

(Almost) 200 Years of News-Based Economic Sentiment


Paper by Jules H. van Binsbergen, Svetlana Bryzgalova, Mayukh Mukhopadhyay & Varun Sharma: “Using text from 200 million pages of 13,000 US local newspapers and machine learning methods, we construct a 170-year-long measure of economic sentiment at the country and state levels, that expands existing measures in both the time series (by more than a century) and the cross-section. Our measure predicts GDP (both nationally and locally), consumption, and employment growth, even after controlling for commonly-used predictors, as well as monetary policy decisions. Our measure is distinct from the information in expert forecasts and leads its consensus value. Interestingly, news coverage has become increasingly negative across all states in the past half-century…(More)”.

The Collaboverse: A Collaborative Data-Sharing and Speech Analysis Platform


Paper by Justin D. Dvorak and Frank R. Boutsen: “Collaboration in the field of speech-language pathology occurs across a variety of digital devices and can entail the usage of multiple software tools, systems, file formats, and even programming languages. Unfortunately, gaps between the laboratory, clinic, and classroom can emerge in part because of siloing of data and workflows, as well as the digital divide between users. The purpose of this tutorial is to present the Collaboverse, a web-based collaborative system that unifies these domains, and describe the application of this tool to common tasks in speech-language pathology. In addition, we demonstrate its utility in machine learning (ML) applications…

This tutorial outlines key concepts in the digital divide, data management, distributed computing, and ML. It introduces the Collaboverse workspace for researchers, clinicians, and educators in speech-language pathology who wish to improve their collaborative network and leverage advanced computation abilities. It also details an ML approach to prosodic analysis….

The Collaboverse shows promise in narrowing the digital divide and is capable of generating clinically relevant data, specifically in the area of prosody, whose computational complexity has limited widespread analysis in research and clinic alike. In addition, it includes an augmentative and alternative communication app allowing visual, nontextual communication…(More)”.

Finding, distinguishing, and understanding overlooked policy entrepreneurs


Paper by Gwen Arnold, Meghan Klasic, Changtong Wu, Madeline Schomburg & Abigail York: “Scholars have spent decades arguing that policy entrepreneurs, change agents who work individually and in groups to influence the policy process, can be crucial in introducing policy innovation and spurring policy change. How to identify policy entrepreneurs empirically has received less attention. This oversight is consequential because scholars trying to understand when policy entrepreneurs emerge, and why, and what makes them more or less successful, need to be able to identify these change agents reliably and accurately. This paper explores the ways policy entrepreneurs are currently identified and highlights issues with current approaches. We introduce a new technique for eliciting and distinguishing policy entrepreneurs, coupling automated and manual analysis of local news media and a survey of policy entrepreneur candidates. We apply this technique to the empirical case of unconventional oil and gas drilling in Pennsylvania and derive some tentative results concerning factors which increase entrepreneurial efficacy…(More)”.

Protecting Policy Space for Indigenous Data Sovereignty Under International Digital Trade Law


Paper by Andrew D. Mitchell and Theo Samlidis: “The impact of economic agreements on Indigenous peoples’ broader rights and interests has been subject to ongoing scrutiny. Technological developments and an increasing emphasis on Indigenous sovereignty within the digital domain have given rise to a global Indigenous data sovereignty movement, surfacing concerns about how international economic law impacts Indigenous peoples’ sovereignty over their data. This Article examines the policy space certain governments have reserved under international economic agreements to introduce measures for protecting Indigenous data or digital sovereignty (IDS). We argue that treaty countries have secured, under recent international digital trade chapters and agreements, the benefits of a comprehensive economic treaty and sufficient regulatory autonomy to protect Indigenous data sovereignty…(More)”

Scaling Synthetic Data Creation with 1,000,000,000 Personas


Paper by Xin Chan, et al: “We propose a novel persona-driven data synthesis methodology that leverages various perspectives within a large language model (LLM) to create diverse synthetic data. To fully exploit this methodology at scale, we introduce Persona Hub — a collection of 1 billion diverse personas automatically curated from web data. These 1 billion personas (~13% of the world’s total population), acting as distributed carriers of world knowledge, can tap into almost every perspective encapsulated within the LLM, thereby facilitating the creation of diverse synthetic data at scale for various scenarios. By showcasing Persona Hub’s use cases in synthesizing high-quality mathematical and logical reasoning problems, instructions (i.e., user prompts), knowledge-rich texts, game NPCs and tools (functions) at scale, we demonstrate persona-driven data synthesis is versatile, scalable, flexible, and easy to use, potentially driving a paradigm shift in synthetic data creation and applications in practice, which may have a profound impact on LLM research and development…(More)”.

Collaborating with Journalists and AI: Leveraging Social Media Images for Enhanced Disaster Resilience and Recovery


Paper by Murthy Dhiraj et al: “Methods to meaningfully integrate journalists into crisis informatics remain lacking. We explored the feasibility of generating a real-time, priority-driven map of infrastructure damage during a natural disaster by strategically selecting journalist networks to identify sources of image-based infrastructure-damage data. Using the REST Twitter API, 1,000,522 tweets were collected from September 13-18, 2018, during and after Hurricane Florence made landfall in the United States. Tweets were classified by source (e.g., news organizations or citizen journalists), and 11,638 images were extracted. We utilized Google’s AutoML Vision software to successfully develop a machine learning image classification model to interpret this sample of images. As a result, 80% of our labeled data was used for training, 10% for validation, and 10% for testing. The model achieved an average precision of 90.6%, an average recall of 77.2%, and an F1 score of .834. In the future, establishing strategic networks of journalists ahead of disasters will reduce the time needed to identify disaster-response targets, thereby focusing relief and recovery efforts in real-time. This approach ultimately aims to save lives and mitigate harm…(More)”.

Exploring Digital Biomarkers for Depression Using Mobile Technology


Paper by Yuezhou Zhang et al: “With the advent of ubiquitous sensors and mobile technologies, wearables and smartphones offer a cost-effective means for monitoring mental health conditions, particularly depression. These devices enable the continuous collection of behavioral data, providing novel insights into the daily manifestations of depressive symptoms.

We found several significant links between depression severity and various behavioral biomarkers: elevated depression levels were associated with diminished sleep quality (assessed through Fitbit metrics), reduced sociability (approximated by Bluetooth), decreased levels of physical activity (quantified by step counts and GPS data), a slower cadence of daily walking (captured by smartphone accelerometers), and disturbances in circadian rhythms (analyzed across various data streams).
Leveraging digital biomarkers for assessing and continuously monitoring depression introduces a new paradigm in early detection and development of customized intervention strategies. Findings from these studies not only enhance our comprehension of depression in real-world settings but also underscore the potential of mobile technologies in the prevention and management of mental health issues…(More)”