New AI Collaboratives to take action on wildfires and food insecurity


Google: “…last September we introduced AI Collaboratives, a new funding approach designed to unite public, private and nonprofit organizations, and researchers, to create AI-powered solutions to help people around the world.

Today, we’re sharing more about our first two focus areas for AI Collaboratives: Wildfires and Food Security.

Wildfires are a global crisis, claiming more than 300,000 lives due to smoke exposure annually and causing billions of dollars in economic damage. …Google.org has convened more than 15 organizations, including Earth Fire Alliance and Moore Foundation, to help in this important effort. By coordinating funding and integrating cutting-edge science, emerging technology and on-the-ground applications, we can provide collaborators with the tools they need to identify and track wildfires in near real time; quantify wildfire risk; shift more acreage to beneficial fires; and ultimately reduce the damage caused by catastrophic wildfires.

Nearly one-third of the world’s population faces moderate or severe food insecurity due to extreme weather, conflict and economic shocks. The AI Collaborative: Food Security will strengthen the resilience of global food systems and improve food security for the world’s most vulnerable populations through AI technologies, collaborative research, data-sharing and coordinated action. To date, 10 organizations have joined us in this effort, and we’ll share more updates soon…(More)”.

Large AI models are cultural and social technologies


Essay by Henry Farrell, Alison Gopnik, Cosma Shalizi, and James Evans: “Debates about artificial intelligence (AI) tend to revolve around whether large models are intelligent, autonomous agents. Some AI researchers and commentators speculate that we are on the cusp of creating agents with artificial general intelligence (AGI), a prospect anticipated with both elation and anxiety. There have also been extensive conversations about cultural and social consequences of large models, orbiting around two foci: immediate effects of these systems as they are currently used, and hypothetical futures when these systems turn into AGI agents perhaps even superintelligent AGI agents.

But this discourse about large models as intelligent agents is fundamentally misconceived. Combining ideas from social and behavioral sciences with computer science can help us understand AI systems more accurately. Large Models should not be viewed primarily as intelligent agents, but as a new kind of cultural and social technology, allowing humans to take advantage of information other humans have accumulated.

The new technology of large models combines important features of earlier technologies. Like pictures, writing, print, video, Internet search, and other such technologies, large models allow people to access information that other people have created. Large Models – currently language, vision, and multi-modal depend on the fact that the Internet has made the products of these earlier technologies readily available in machine-readable form. But like economic markets, state bureaucracies, and other social technologies, these systems not only make information widely available, they allow it to be reorganized, transformed, and restructured in distinctive ways. Adopting Herbert Simon’s terminology, large models are a new variant of the “artificial systems of human society” that process information to enable large-scale coordination…(More)”

Can small language models revitalize Indigenous languages?


Article by Brooke Tanner and Cameron F. Kerry: “Indigenous languages play a critical role in preserving cultural identity and transmitting unique worldviews, traditions, and knowledge, but at least 40% of the world’s 6,700 languages are currently endangered. The United Nations declared 2022-2032 as the International Decade of Indigenous Languages to draw attention to this threat, in hopes of supporting the revitalization of these languages and preservation of access to linguistic resources.  

Building on the advantages of SLMs, several initiatives have successfully adapted these models specifically for Indigenous languages. Such Indigenous language models (ILMs) represent a subset of SLMs that are designed, trained, and fine-tuned with input from the communities they serve. 

Case studies and applications 

  • Meta released No Language Left Behind (NLLB-200), a 54 billion–parameter open-source machine translation model that supports 200 languages as part of Meta’s universal speech translator project. The model includes support for languages with limited translation resources. While the model’s breadth of languages included is novel, NLLB-200 can struggle to capture the intricacies of local context for low-resource languages and often relies on machine-translated sentence pairs across the internet due to the scarcity of digitized monolingual data. 
  • Lelapa AI’s InkubaLM-0.4B is an SLM with applications for low-resource African languages. Trained on 1.9 billion tokens across languages including isiZulu, Yoruba, Swahili, and isiXhosa, InkubaLM-0.4B (with 400 million parameters) builds on Meta’s LLaMA 2 architecture, providing a smaller model than the original LLaMA 2 pretrained model with 7 billion parameters. 
  • IBM Research Brazil and the University of São Paulo have collaborated on projects aimed at preserving Brazilian Indigenous languages such as Guarani Mbya and Nheengatu. These initiatives emphasize co-creation with Indigenous communities and address concerns about cultural exposure and language ownership. Initial efforts included electronic dictionaries, word prediction, and basic translation tools. Notably, when a prototype writing assistant for Guarani Mbya raised concerns about exposing their language and culture online, project leaders paused further development pending community consensus.  
  • Researchers have fine-tuned pre-trained models for Nheengatu using linguistic educational sources and translations of the Bible, with plans to incorporate community-guided spellcheck tools. Since the translations relying on data from the Bible, primarily translated by colonial priests, often sounded archaic and could reflect cultural abuse and violence, they were classified as potentially “toxic” data that would not be used in any deployed system without explicit Indigenous community agreement…(More)”.

A Quest for AI Knowledge


Paper by Joshua S. Gans: “This paper examines how the introduction of artificial intelligence (AI), particularly generative and large language models capable of interpolating precisely between known data points, reshapes scientists’ incentives for pursuing novel versus incremental research. Extending the theoretical framework of Carnehl and Schneider (2025), we analyse how decision-makers leverage AI to improve precision within well-defined knowledge domains. We identify conditions under which the availability of AI tools encourages scientists to choose more socially valuable, highly novel research projects, contrasting sharply with traditional patterns of incremental knowledge growth. Our model demonstrates a critical complementarity: scientists strategically align their research novelty choices to maximise the domain where AI can reliably inform decision-making. This dynamic fundamentally transforms the evolution of scientific knowledge, leading either to systematic “stepping stone” expansions or endogenous research cycles of strategic knowledge deepening. We discuss the broader implications for science policy, highlighting how sufficiently capable AI tools could mitigate traditional inefficiencies in scientific innovation, aligning private research incentives closely with the social optimum…(More)”.

The Age of AI in the Life Sciences: Benefits and Biosecurity Considerations


Report by the National Academies of Sciences, Engineering, and Medicine: “Artificial intelligence (AI) applications in the life sciences have the potential to enable advances in biological discovery and design at a faster pace and efficiency than is possible with classical experimental approaches alone. At the same time, AI-enabled biological tools developed for beneficial applications could potentially be misused for harmful purposes. Although the creation of biological weapons is not a new concept or risk, the potential for AI-enabled biological tools to affect this risk has raised concerns during the past decade.

This report, as requested by the Department of Defense, assesses how AI-enabled biological tools could uniquely impact biosecurity risk, and how advancements in such tools could also be used to mitigate these risks. The Age of AI in the Life Sciences reviews the capabilities of AI-enabled biological tools and can be used in conjunction with the 2018 National Academies report, Biodefense in the Age of Synthetic Biology, which sets out a framework for identifying the different risk factors associated with synthetic biology capabilities…(More)”

Generative AI in Transportation Planning: A Survey


Paper by Longchao Da: “The integration of generative artificial intelligence (GenAI) into transportation planning has the potential to revolutionize tasks such as demand forecasting, infrastructure design, policy evaluation, and traffic simulation. However, there is a critical need for a systematic framework to guide the adoption of GenAI in this interdisciplinary domain. In this survey, we, a multidisciplinary team of researchers spanning computer science and transportation engineering, present the first comprehensive framework for leveraging GenAI in transportation planning. Specifically, we introduce a new taxonomy that categorizes existing applications and methodologies into two perspectives: transportation planning tasks and computational techniques. From the transportation planning perspective, we examine the role of GenAI in automating descriptive, predictive, generative simulation, and explainable tasks to enhance mobility systems. From the computational perspective, we detail advancements in data preparation, domain-specific fine-tuning, and inference strategies such as retrieval-augmented generation and zero-shot learning tailored to transportation applications. Additionally, we address critical challenges, including data scarcity, explainability, bias mitigation, and the development of domain-specific evaluation frameworks that align with transportation goals like sustainability, equity, and system efficiency. This survey aims to bridge the gap between traditional transportation planning methodologies and modern AI techniques, fostering collaboration and innovation. By addressing these challenges and opportunities, we seek to inspire future research that ensures ethical, equitable, and impactful use of generative AI in transportation planning…(More)”.

Launch: A Blueprint to Unlock New Data Commons for Artificial Intelligence (AI)


Blueprint by Hannah Chafetz, Andrew J. Zahuranec, and Stefaan Verhulst: “In today’s rapidly evolving AI landscape, it is critical to broaden access to diverse and high-quality data to ensure that AI applications can serve all communities equitably. Yet, we are on the brink of a potential “data winter,” where valuable data assets that could drive public good are increasingly locked away or inaccessible.

Data commons — collaboratively governed ecosystems that enable responsible sharing of diverse datasets across sectors — offer a promising solution. By pooling data under clear standards and shared governance, data commons can unlock the potential of AI for public benefit while ensuring that its development reflects the diversity of experiences and needs across society.

To accelerate the creation of data commons, The Open Data Policy, today, releases “A Blueprint to Unlock New Data Commons for AI” — a guide on how to steward data to create data commons that enable public-interest AI use cases…the document is aimed at supporting libraries, universities, research centers, and other data holders (e.g. governments and nonprofits) through four modules:

  • Mapping the Demand and Supply: Understanding why AI systems need data, what data can be made available to train, adapt, or augment AI, and what a viable data commons prototype might look like that incorporates stakeholder needs and values;
  • Unlocking Participatory Governance: Co-designing key aspects of the data commons with key stakeholders and documenting these aspects within a formal agreement;
  • Building the Commons: Establishing the data commons from a practical perspective and ensure all stakeholders are incentivized to implement it; and
  • Assessing and Iterating: Evaluating how the commons is working and iterating as needed.

These modules are further supported by two supplementary taxonomies. “The Taxonomy of Data Types” provides a list of data types that can be valuable for public-interest generative AI use cases. The “Taxonomy of Use Cases” outlines public-interest generative AI applications that can be developed using a data commons approach, along with possible outcomes and stakeholders involved.

A separate set of worksheets can be used to further guide organizations in deploying these tools…(More)”.

Funding the Future: Grantmakers Strategies in AI Investment


Report by Project Evident: “…looks at how philanthropic funders are approaching requests to fund the use of AI… there was common recognition of AI’s importance and the tension between the need to learn more and to act quickly to meet the pace of innovation, adoption, and use of AI tools.

This research builds on the work of a February 2024 Project Evident and Stanford Institute for Human-Centered Artificial Intelligence working paper, Inspiring Action: Identifying the Social Sector AI Opportunity Gap. That paper reported that more practitioners than funders (by over a third) claimed their organization utilized AI. 

“From our earlier research, as well as in conversations with funders and nonprofits, it’s clear there’s a mismatch in the understanding and desire for AI tools and the funding of AI tools,” said Sarah Di Troia, Managing Director of Project Evident’s OutcomesAI practice and author of the report. “Grantmakers have an opportunity to quickly upskill their understanding – to help nonprofits improve their efficiency and impact, of course, but especially to shape the role of AI in civil society.”

The report offers a number of recommendations to the philanthropic sector. For example, funders and practitioners should ensure that community voice is included in the implementation of new AI initiatives to build trust and help reduce bias. Grantmakers should consider funding that allows for flexibility and innovation so that the social and education sectors can experiment with approaches. Most importantly, funders should increase their capacity and confidence in assessing AI implementation requests along both technical and ethical criteria…(More)”.

Bridging the Data Provenance Gap Across Text, Speech and Video


Paper by Shayne Longpre et al: “Progress in AI is driven largely by the scale and quality of training data. Despite this, there is a deficit of empirical analysis examining the attributes of well-established datasets beyond text. In this work we conduct the largest and first-of-its-kind longitudinal audit across modalities–popular text, speech, and video datasets–from their detailed sourcing trends and use restrictions to their geographical and linguistic representation. Our manual analysis covers nearly 4000 public datasets between 1990-2024, spanning 608 languages, 798 sources, 659 organizations, and 67 countries. We find that multimodal machine learning applications have overwhelmingly turned to web-crawled, synthetic, and social media platforms, such as YouTube, for their training sets, eclipsing all other sources since 2019. Secondly, tracing the chain of dataset derivations we find that while less than 33% of datasets are restrictively licensed, over 80% of the source content in widely-used text, speech, and video datasets, carry non-commercial restrictions. Finally, counter to the rising number of languages and geographies represented in public AI training datasets, our audit demonstrates measures of relative geographical and multilingual representation have failed to significantly improve their coverage since 2013. We believe the breadth of our audit enables us to empirically examine trends in data sourcing, restrictions, and Western-centricity at an ecosystem-level, and that visibility into these questions are essential to progress in responsible AI. As a contribution to ongoing improvements in dataset transparency and responsible use, we release our entire multimodal audit, allowing practitioners to trace data provenance across text, speech, and video…(More)”.

Artificial intelligence for modelling infectious disease epidemics


Paper by Moritz U. G. Kraemer et al: “Infectious disease threats to individual and public health are numerous, varied and frequently unexpected. Artificial intelligence (AI) and related technologies, which are already supporting human decision making in economics, medicine and social science, have the potential to transform the scope and power of infectious disease epidemiology. Here we consider the application to infectious disease modelling of AI systems that combine machine learning, computational statistics, information retrieval and data science. We first outline how recent advances in AI can accelerate breakthroughs in answering key epidemiological questions and we discuss specific AI methods that can be applied to routinely collected infectious disease surveillance data. Second, we elaborate on the social context of AI for infectious disease epidemiology, including issues such as explainability, safety, accountability and ethics. Finally, we summarize some limitations of AI applications in this field and provide recommendations for how infectious disease epidemiology can harness most effectively current and future developments in AI…(More)”.