Paper by Kathryne Metcalf and Jathan Sadowski : “Recent reporting has revealed that the UK Biobank (UKB)—a large, publicly-funded research database containing highly-sensitive health records of over half a million participants—has shared its data with private insurance companies seeking to develop actuarial AI systems for analyzing risk and predicting health. While news reports have characterized this as a significant breach of public trust, the UKB contends that insurance research is “in the public interest,” and that all research participants are adequately protected from the possibility of insurance discrimination via data de-identification. Here, we contest both of these claims. Insurers use population data to identify novel categories of risk, which become fodder in the production of black-boxed actuarial algorithms. The deployment of these algorithms, as we argue, has the potential to increase inequality in health and decrease access to insurance. Importantly, these types of harms are not limited just to UKB participants: instead, they are likely to proliferate unevenly across various populations within global insurance markets via practices of profiling and sorting based on the synthesis of multiple data sources, alongside advances in data analysis capabilities, over space/time. This necessitates a significantly expanded understanding of the publics who must be involved in biobank governance and data-sharing decisions involving insurers…(More)”.
Data’s Role in Unlocking Scientific Potential
Report by the Special Competitive Studies Project: “…we outline two actionable steps the U.S. government can take immediately to address the data sharing challenges hindering scientific research.
1. Create Comprehensive Data Inventories Across Scientific Domains
We recommend the Secretary of Commerce, acting through the Department of Commerce’s Chief Data Officer and the Director of the National Institute of Standards and Technology (NIST), and with the Federal Chief Data Officer Council (CDO Council) create a government-led inventory where organizations – universities, industries, and research institutes – can catalog their datasets with key details like purpose, description, and accreditation. Similar to platforms like data.gov, this centralized repository would make high-quality data more visible and accessible, promoting scientific collaboration. To boost participation, the government could offer incentives, such as grants or citation credits for researchers whose data is used. Contributing organizations would also be responsible for regularly updating their entries, ensuring the data stays relevant and searchable.
2. Create Scientific Data Sharing Public-Private Partnerships
A critical recommendation of the National Data Action Plan was for the United States to facilitate the creation of data sharing public-private partnerships for specific sectors. The U.S. Government should coordinate data sharing partnerships with its departments and agencies, industry, academia, and civil society. Data collected by one entity can be tremendously valuable to others. But incentivizing data sharing is challenging as privacy, security, legal (e.g., liability), and intellectual property (IP) concerns can limit willingness to share. However, narrowly-scoped PPPs can help overcome these barriers, allowing for greater data sharing and mutually beneficial data use…(More)”
Can LLMs advance democratic values?
Paper by Seth Lazar and Lorenzo Manuali: “LLMs are among the most advanced tools ever devised for analysing and generating linguistic content. Democratic deliberation and decision-making involve, at several distinct stages, the production and analysis of language. So it is natural to ask whether our best tools for manipulating language might prove instrumental to one of our most important linguistic tasks. Researchers and practitioners have recently asked whether LLMs can support democratic deliberation by leveraging abilities to summarise content, as well as to aggregate opinion over summarised content, and indeed to represent voters by predicting their preferences over unseen choices. In this paper, we assess whether using LLMs to perform these and related functions really advances the democratic values that inspire these experiments. We suggest that the record is decidedly mixed. In the presence of background inequality of power and resources, as well as deep moral and political disagreement, we should be careful not to use LLMs in ways that automate non-instrumentally valuable components of the democratic process, or else threaten to supplant fair and transparent decision-making procedures that are necessary to reconcile competing interests and values. However, while we argue that LLMs should be kept well clear of formal democratic decision-making processes, we think that they can be put to good use in strengthening the informal public sphere: the arena that mediates between democratic governments and the polities that they serve, in which political communities seek information, form civic publics, and hold their leaders to account…(More)”.
AI-accelerated Nazca survey nearly doubles the number of known figurative geoglyphs and sheds light on their purpose
Paper by Masato Sakai, Akihisa Sakurai, Siyuan Lu, and Marcus Freitag: “It took nearly a century to discover a total of 430 figurative Nazca geoglyphs, which offer significant insights into the ancient cultures at the Nazca Pampa. Here, we report the deployment of an AI system to the entire Nazca region, a UNESCO World Heritage site, leading to the discovery of 303 new figurative geoglyphs within only 6 mo of field survey, nearly doubling the number of known figurative geoglyphs. Even with limited training examples, the developed AI approach is demonstrated to be effective in detecting the smaller relief-type geoglyphs, which unlike the giant line-type geoglyphs are very difficult to discern. The improved account of figurative geoglyphs enables us to analyze their motifs and distribution across the Nazca Pampa. We find that relief-type geoglyphs depict mainly human motifs or motifs of things modified by humans, such as domesticated animals and decapitated heads (81.6%). They are typically located within viewing distance (on average 43 m) of ancient trails that crisscross the Nazca Pampa and were most likely built and viewed at the individual or small-group level. On the other hand, the giant line-type figurative geoglyphs mainly depict wild animals (64%). They are found an average of 34 m from the elaborate linear/trapezoidal network of geoglyphs, which suggests that they were probably built and used on a community level for ritual activities…(More)”
The Age of AI Nationalism and Its Effects
Paper by Susan Ariel Aaronson: “Policy makers in many countries are determined to develop artificial intelligence (AI) within their borders because they view AI as essential to both national security and economic growth. Some countries have proposed adopting AI sovereignty, where the nation develops AI for its people, by its people and within its borders. In this paper, the author makes a distinction between policies designed to advance domestic AI and policies that, with or without direct intent, hamper the production or trade of foreign-produced AI (known as “AI nationalism”). AI nationalist policies in one country can make it harder for firms in another country to develop AI. If officials can limit access to key components of the AI supply chain, such as data, capital, expertise or computing power, they may be able to limit the AI prowess of competitors in country Y and/or Z. Moreover, if policy makers can shape regulations in ways that benefit local AI competitors, they may also impede the competitiveness of other nations’ AI developers. AI nationalism may seem appropriate given the import of AI, but this paper aims to illuminate how AI nationalistic policies may backfire and could divide the world into AI haves and have nots…(More)”.
We are Developing AI at the Detriment of the Global South — How a Focus on Responsible Data Re-use Can Make a Difference
Article by Stefaan Verhulst and Peter Addo: “…At the root of this debate runs a frequent concern with how data is collected, stored, used — and responsibly reused for other purposes that initially collected for…
In this article, we propose that promoting responsible reuse of data requires addressing the power imbalances inherent in the data ecology. These imbalances disempower key stakeholders, thereby undermining trust in data management practices. As we recently argued in a report on “responsible data reuse in developing countries,” prepared for Agence Française de Development (AFD), power imbalences may be particularly pernicious when considering the use of data in the Global South. Addressing these requires broadening notions of consent, beyond current highly individualized approaches, in favor of what we instead term a social license for reuse.
In what follows, we explain what a social license means, and propose three steps to help achieve that goal. We conclude by calling for a new research agenda — one that would stretch existing disciplinary and conceptual boundaries — to reimagine what social licenses might mean, and how they could be operationalized…(More)”.
The ABC’s of Who Benefits from Working with AI: Ability, Beliefs, and Calibration
Paper by Andrew Caplin: “We use a controlled experiment to show that ability and belief calibration jointly determine the benefits of working with Artificial Intelligence (AI). AI improves performance more for people with low baseline ability. However, holding ability constant, AI assistance is more valuable for people who are calibrated, meaning they have accurate beliefs about their own ability. People who know they have low ability gain the most from working with AI. In a counterfactual analysis, we show that eliminating miscalibration would cause AI to reduce performance inequality nearly twice as much as it already does…(More)”.
First-of-its-kind dataset connects greenhouse gases and air quality
NOAA Research: “The GReenhouse gas And Air Pollutants Emissions System (GRA²PES), from NOAA and the National Institute of Standards and Technology (NIST), combines information on greenhouse gas and air quality pollutant sources into a single national database, offering innovative interactive map displays and new benefits for both climate and public health solutions.
A new U.S.-based system to combine air quality and greenhouse gas pollution sources into a single national research database is now available in the U.S. Greenhouse Gas Center portal. This geospatial data allows leaders at city, state, and regional scales to more easily identify and take steps to address air quality issues while reducing climate-related hazards for populations.
The dataset is the GReenhouse gas And Air Pollutants Emissions System (GRA²PES). A research project developed by NOAA and NIST, GRA²PES captures monthly greenhouse gas (GHG) emissions activity for multiple economic sectors to improve measurement and modeling for both GHG and air pollutants across the contiguous U.S.
Having the GHG and air quality constituents in the same dataset will be exceedingly helpful, said Columbia University atmospheric scientist Roisin Commane, the lead on a New York City project to improve emissions estimates…(More)”.
As AI-powered health care expands, experts warn of biases
Article by Marta Biino: “Google’s DeepMind artificial intelligence research laboratory and German pharma company BioNTech are both building AI-powered lab assistants to help scientists conduct experiments and perform tasks, the Financial Times reported.
It’s the latest example of how developments in artificial intelligence are revolutionizing a number of fields, including medicine. While AI has long been used in radiology, for image analysis, or oncology to classify skin lesions for example, as the technology continues to advance its applications are growing.
OpenAI’s GPT models have outperformed humans in making cancer diagnoses based on MRI reports and beat PhD-holders in standardized science tests, to name a few.
However, as AI’s use in health care expands, some fear the notoriously biased technology could carry negative repercussions for patients…(More)”.
How The New York Times incorporates editorial judgment in algorithms to curate its home page
Article by Zhen Yang: “Whether on the web or the app, the home page of The New York Times is a crucial gateway, setting the stage for readers’ experiences and guiding them to the most important news of the day. The Times publishes over 250 stories daily, far more than the 50 to 60 stories that can be featured on the home page at a given time. Traditionally, editors have manually selected and programmed which stories appear, when and where, multiple times daily. This manual process presents challenges:
- How can we provide readers a relevant, useful, and fresh experience each time they visit the home page?
- How can we make our editorial curation process more efficient and scalable?
- How do we maximize the reach of each story and expose more stories to our readers?
To address these challenges, the Times has been actively developing and testing editorially driven algorithms to assist in curating home page content. These algorithms are editorially driven in that a human editor’s judgment or input is incorporated into every aspect of the algorithm — including deciding where on the home page the stories are placed, informing the rankings, and potentially influencing and overriding algorithmic outputs when necessary. From the get-go, we’ve designed algorithmic programming to elevate human curation, not to replace it…
The Times began using algorithms for content recommendations in 2011 but only recently started applying them to home page modules. For years, we only had one algorithmically-powered module, “Smarter Living,” on the home page, and later, “Popular in The Times.” Both were positioned relatively low on the page.
Three years ago, the formation of a cross-functional team — including newsroom editors, product managers, data scientists, data analysts, and engineers — brought the momentum needed to advance our responsible use of algorithms. Today, nearly half of the home page is programmed with assistance from algorithms that help promote news, features, and sub-brand content, such as The Athletic and Wirecutter. Some of these modules, such as the features module located at the top right of the home page on the web version, are in highly visible locations. During major news moments, editors can also deploy algorithmic modules to display additional coverage to complement a main module of stories near the top of the page. (The topmost news package of Figure 1 is an example of this in action.)…(More)”