Researchers warn we could run out of data to train AI by 2026. What then?


Article by Rita Matulionyte: “As artificial intelligence (AI) reaches the peak of its popularity, researchers have warned the industry might be running out of training data – the fuel that runs powerful AI systems. This could slow down the growth of AI models, especially large language models, and may even alter the trajectory of the AI revolution.

But why is a potential lack of data an issue, considering how much there are on the web? And is there a way to address the risk?…

We need a lot of data to train powerful, accurate and high-quality AI algorithms. For instance, ChatGPT was trained on 570 gigabytes of text data, or about 300 billion words.

Similarly, the stable diffusion algorithm (which is behind many AI image-generating apps such as DALL-E, Lensa and Midjourney) was trained on the LIAON-5B dataset comprising of 5.8 billion image-text pairs. If an algorithm is trained on an insufficient amount of data, it will produce inaccurate or low-quality outputs.

The quality of the training data is also important…This is why AI developers seek out high-quality content such as text from books, online articles, scientific papers, Wikipedia, and certain filtered web content. The Google Assistant was trained on 11,000 romance novels taken from self-publishing site Smashwords to make it more conversational.

The AI industry has been training AI systems on ever-larger datasets, which is why we now have high-performing models such as ChatGPT or DALL-E 3. At the same time, research shows online data stocks are growing much slower than datasets used to train AI.

In a paper published last year, a group of researchers predicted we will run out of high-quality text data before 2026 if the current AI training trends continue. They also estimated low-quality language data will be exhausted sometime between 2030 and 2050, and low-quality image data between 2030 and 2060.

AI could contribute up to US$15.7 trillion (A$24.1 trillion) to the world economy by 2030, according to accounting and consulting group PwC. But running out of usable data could slow down its development…(More)”.

What Is Public Trust in the Health System? Insights into Health Data Use


Open Access Book by Felix Gille: “This book explores the concept of public trust in health systems.

In the context of recent events, including public response to interventions to tackle the COVID-19 pandemic, vaccination uptake and the use of health data and digital health, this important book uses empirical evidence to address why public trust is vital to a well-functioning health system.

In doing so, it provides a comprehensive contemporary explanation of public trust, how it affects health systems and how it can be nurtured and maintained as an integral component of health system governance…(More)”.

Climate data can save lives. Most countries can’t access it.


Article by Zoya Teirstein: “Earth just experienced one of its hottest, and most damaging, periods on record. Heat waves in the United States, Europe, and China; catastrophic flooding in IndiaBrazilHong Kong, and Libya; and outbreaks of malaria, dengue, and other mosquito-borne illnesses across southern Asia claimed tens of thousands of lives. The vast majority of these deaths could have been averted with the right safeguards in place.

The World Meteorological Organization, or WMO, published a report last week that shows just 11 percent of countries have the full arsenal of tools required to save lives as the impacts of climate change — including deadly weather events, infectious diseases, and respiratory illnesses like asthma — become more extreme. The United Nations climate agency predicts that significant natural disasters will hit the planet 560 times per year by the end of this decade. What’s more, countries that lack early warning systems, such as extreme heat alerts, will see eight times more climate-related deaths than countries that are better prepared. By midcentury, some 50 percent of these deaths will take place in Africa, a continent that is responsible for around 4 percent of the world’s greenhouse gas emissions each year…(More)”.

AI and Democracy’s Digital Identity Crisis


Essay by Shrey Jain, Connor Spelliscy, Samuel Vance-Law and Scott Moore: “AI-enabled tools have become sophisticated enough to allow a small number of individuals to run disinformation campaigns of an unprecedented scale. Privacy-preserving identity attestations can drastically reduce instances of impersonation and make disinformation easy to identify and potentially hinder. By understanding how identity attestations are positioned across the spectrum of decentralization, we can gain a better understanding of the costs and benefits of various attestations. In this paper, we discuss attestation types, including governmental, biometric, federated, and web of trust-based, and include examples such as e-Estonia, China’s social credit system, Worldcoin, OAuth, X (formerly Twitter), Gitcoin Passport, and EAS. We believe that the most resilient systems create an identity that evolves and is connected to a network of similarly evolving identities that verify one another. In this type of system, each entity contributes its respective credibility to the attestation process, creating a larger, more comprehensive set of attestations. We believe these systems could be the best approach to authenticating identity and protecting against some of the threats to democracy that AI can pose in the hands of malicious actors. However, governments will likely attempt to mitigate these risks by implementing centralized identity authentication systems; these centralized systems could themselves pose risks to the democratic processes they are built to defend. We therefore recommend that policymakers support the development of standards-setting organizations for identity, provide legal clarity for builders of decentralized tooling, and fund research critical to effective identity authentication systems…(More)”

Networked Press Freedom


Book by Mike Ananny: “…offers a new way to think about freedom of the press in a time when media systems are in fundamental flux. Ananny challenges the idea that press freedom comes only from heroic, lone journalists who speak truth to power. Instead, drawing on journalism studies, institutional sociology, political theory, science and technology studies, and an analysis of ten years of journalism discourse about news and technology, he argues that press freedom emerges from social, technological, institutional, and normative forces that vie for power and fight for visions of democratic life. He shows how dominant, historical ideals of professionalized press freedom often mistook journalistic freedom from constraints for the public’s freedom to encounter the rich mix of people and ideas that self-governance requires. Ananny’s notion of press freedom ensures not only an individual right to speak, but also a public right to hear.

Seeing press freedom as essential for democratic self-governance, Ananny explores what publics need, what kind of free press they should demand, and how today’s press freedom emerges from intertwined collections of humans and machines. If someone says, “The public needs a free press,” Ananny urges us to ask in response, “What kind of public, what kind of freedom, and what kind of press?” Answering these questions shows what robust, self-governing publics need to demand of technologists and journalists alike…(More)”.

The Bletchley Declaration


Declaration by Countries Attending the AI Safety Summit, 1-2 November 2023: “In the context of our cooperation, and to inform action at the national and international levels, our agenda for addressing frontier AI risk will focus on:

  • identifying AI safety risks of shared concern, building a shared scientific and evidence-based understanding of these risks, and sustaining that understanding as capabilities continue to increase, in the context of a wider global approach to understanding the impact of AI in our societies.
  • building respective risk-based policies across our countries to ensure safety in light of such risks, collaborating as appropriate while recognising our approaches may differ based on national circumstances and applicable legal frameworks. This includes, alongside increased transparency by private actors developing frontier AI capabilities, appropriate evaluation metrics, tools for safety testing, and developing relevant public sector capability and scientific research.

In furtherance of this agenda, we resolve to support an internationally inclusive network of scientific research on frontier AI safety that encompasses and complements existing and new multilateral, plurilateral and bilateral collaboration, including through existing international fora and other relevant initiatives, to facilitate the provision of the best science available for policy making and the public good.

In recognition of the transformative positive potential of AI, and as part of ensuring wider international cooperation on AI, we resolve to sustain an inclusive global dialogue that engages existing international fora and other relevant initiatives and contributes in an open manner to broader international discussions, and to continue research on frontier AI safety to ensure that the benefits of the technology can be harnessed responsibly for good and for all. We look forward to meeting again in 2024…(More)”.

Does the sun rise for ChatGPT? Scientific discovery in the age of generative AI


Paper by David Leslie: “In the current hype-laden climate surrounding the rapid proliferation of foundation models and generative AI systems like ChatGPT, it is becoming increasingly important for societal stakeholders to reach sound understandings of their limitations and potential transformative effects. This is especially true in the natural and applied sciences, where magical thinking among some scientists about the take-off of “artificial general intelligence” has arisen simultaneously as the growing use of these technologies is putting longstanding norms, policies, and standards of good research practice under pressure. In this analysis, I argue that a deflationary understanding of foundation models and generative AI systems can help us sense check our expectations of what role they can play in processes of scientific exploration, sense-making, and discovery. I claim that a more sober, tool-based understanding of generative AI systems as computational instruments embedded in warm-blooded research processes can serve several salutary functions. It can play a crucial bubble-bursting role that mitigates some of the most serious threats to the ethos of modern science posed by an unreflective overreliance on these technologies. It can also strengthen the epistemic and normative footing of contemporary science by helping researchers circumscribe the part to be played by machine-led prediction in communicative contexts of scientific discovery while concurrently prodding them to recognise that such contexts are principal sites for human empowerment, democratic agency, and creativity. Finally, it can help spur ever richer approaches to collaborative experimental design, theory-construction, and scientific world-making by encouraging researchers to deploy these kinds of computational tools to heuristically probe unbounded search spaces and patterns in high-dimensional biophysical data that would otherwise be inaccessible to human-scale examination and inference…(More)”.

The UN Hired an AI Company to Untangle the Israeli-Palestinian Crisis


Article by David Gilbert: “…The application of artificial intelligence technologies to conflict situations has been around since at least 1996, with machine learning being used to predict where conflicts may occur. The use of AI in this area has expanded in the intervening years, being used to improve logistics, training, and other aspects of peacekeeping missions. Lane and Shults believe they could use artificial intelligence to dig deeper and find the root causes of conflicts.

Their idea for an AI program that models the belief systems that drive human behavior first began when Lane moved to Northern Ireland a decade ago to study whether computation modeling and cognition could be used to understand issues around religious violence.

In Belfast, Lane figured out that by modeling aspects of identity and social cohesion, and identifying the factors that make people motivated to fight and die for a particular cause, he could accurately predict what was going to happen next.

“We set out to try and come up with something that could help us better understand what it is about human nature that sometimes results in conflict, and then how can we use that tool to try and get a better handle or understanding on these deeper, more psychological issues at really large scales,” Lane says.

The result of their work was a study published in 2018 in The Journal for Artificial Societies and Social Simulation, which found that people are typically peaceful but will engage in violence when an outside group threatens the core principles of their religious identity.

A year later, Lane wrote that the model he had developed predicted that measures introduced by Brexit—the UK’s departure from the European Union that included the introduction of a hard border in the Irish Sea between Northern Ireland and the rest of the UK—would result in a rise in paramilitary activity. Months later, the model was proved right.

The multi-agent model developed by Lane and Shults relied on distilling more than 50 million articles from GDelt, a project that ​​monitors “the world’s broadcast, print, and web news from nearly every corner of every country in over 100 languages.” But feeding the AI millions of articles and documents was not enough, the researchers realized. In order to fully understand what was driving the people of Northern Ireland to engage in violence against their neighbors, they would need to conduct their own research…(More)”.

The Open Sky


Essay by Lars Erik Schönander: “Any time you walk outside, satellites may be watching you from space. There are currently more than 8,000 active satellites in orbit, including over a thousand designed to observe the Earth.

Satellite technology has come a long way since its secretive inception during the Cold War, when a country’s ability to successfully operate satellites meant not only that it was capable of launching rockets into Earth orbit but that it had eyes in the sky. Today not only governments across the world but private enterprises too launch satellites, collect and analyze satellite imagery, and sell it to a range of customers, from government agencies to the person on the street. SpaceX’s Starlink satellites bring the Internet to places where conventional coverage is spotty or compromised. Satellite data allows the United States to track rogue ships and North Korean missile launches, while scientists track wildfires, floods, and changes in forest cover.

The industry’s biggest technical challenge, aside from acquiring the satellite imagery itself, has always been to analyze and interpret it. This is why new AI tools are set to drastically change how satellite imagery is used — and who uses it. For instance, Meta’s Segment Anything Model, a machine-learning tool designed to “cut out” discrete objects from images, is proving highly effective at identifying objects in satellite images.

But the biggest breakthrough will likely come from large language models — tools like OpenAI’s ChatGPT — that may soon allow ordinary people to query the Earth’s surface the way data scientists query databases. Achieving this goal is the ambition of companies like Planet Labs, which has launched hundreds of satellites into space and is working with Microsoft to build what it calls a “queryable Earth.” At this point, it is still easy to dismiss their early attempt as a mere toy. But as the computer scientist Paul Graham once noted, if people like a new invention that others dismiss as a toy, this is probably a good sign of its future success.

This means that satellite intelligence capabilities that were once restricted to classified government agencies, and even now belong only to those with bountiful money or expertise, are about to be open to anyone with an Internet connection…(More)”.

The Tragedy of AI Governance


Paper by Simon Chesterman: “Despite hundreds of guides, frameworks, and principles intended to make AI “ethical” or “responsible”, ever more powerful applications continue to be released ever more quickly. Safety and security teams are being downsized or sidelined to bring AI products to market. And a significant portion of AI developers apparently believe there is a real risk that their work poses an existential threat to humanity.

This contradiction between statements and action can be attributed to three factors that undermine the prospects for meaningful governance of AI. The first is the shift of power from public to private hands, not only in deployment of AI products but in fundamental research. The second is the wariness of most states about regulating the sector too aggressively, for fear that it might drive innovation elsewhere. The third is the dysfunction of global processes to manage collective action problems, epitomized by the climate crisis and now frustrating efforts to govern a technology that does not respect borders. The tragedy of AI governance is that those with the greatest leverage to regulate AI have the least interest in doing so, while those with the greatest interest have the least leverage.

Resolving these challenges either requires rethinking the incentive structures — or waiting for a crisis that brings the need for regulation and coordination into sharper focus…(More)”