A.I. May Save Us, or May Construct Viruses to Kill Us


Article by Nicholas Kristof: “Here’s a bargain of the most horrifying kind: For less than $100,000, it may now be possible to use artificial intelligence to develop a virus that could kill millions of people.

That’s the conclusion of Jason Matheny, the president of the RAND Corporation, a think tank that studies security matters and other issues.

“It wouldn’t cost more to create a pathogen that’s capable of killing hundreds of millions of people versus a pathogen that’s only capable of killing hundreds of thousands of people,” Matheny told me.

In contrast, he noted, it could cost billions of dollars to produce a new vaccine or antiviral in response…

In the early 2000s, some of us worried about smallpox being reintroduced as a bioweapon if the virus were stolen from the labs in Atlanta and in Russia’s Novosibirsk region that retain the virus since the disease was eradicated. But with synthetic biology, now it wouldn’t have to be stolen.

Some years ago, a research team created a cousin of the smallpox virus, horse pox, in six months for $100,000, and with A.I. it could be easier and cheaper to refine the virus.

One reason biological weapons haven’t been much used is that they can boomerang. If Russia released a virus in Ukraine, it could spread to Russia. But a retired Chinese general has raised the possibility of biological warfare that targets particular races or ethnicities (probably imperfectly), which would make bioweapons much more useful. Alternatively, it might be possible to develop a virus that would kill or incapacitate a particular person, such as a troublesome president or ambassador, if one had obtained that person’s DNA at a dinner or reception.

Assessments of ethnic-targeting research by China are classified, but they may be why the U.S. Defense Department has said that the most important long-term threat of biowarfare comes from China.

A.I. has a more hopeful side as well, of course. It holds the promise of improving education, reducing auto accidents, curing cancers and developing miraculous new pharmaceuticals.

One of the best-known benefits is in protein folding, which can lead to revolutionary advances in medical care. Scientists used to spend years or decades figuring out the shapes of individual proteins, and then a Google initiative called AlphaFold was introduced that could predict the shapes within minutes. “It’s Google Maps for biology,” Kent Walker, president of global affairs at Google, told me.

Scientists have since used updated versions of AlphaFold to work on pharmaceuticals including a vaccine against malaria, one of the greatest killers of humans throughout history.

So it’s unclear whether A.I. will save us or kill us first…(More)”.

Supporting Scientific Citizens


Article by Lisa Margonelli: “What do nuclear fusion power plants, artificial intelligence, hydrogen infrastructure, and drinking water recycled from human waste have in common? Aside from being featured in this edition of Issues, they all require intense public engagement to choose among technological tradeoffs, safety profiles, and economic configurations. Reaching these understandings requires researchers, engineers, and decisionmakers who are adept at working with the public. It also requires citizens who want to engage with such questions and can articulate what they want from science and technology.

This issue offers a glimpse into what these future collaborations might look like. To train engineers with the “deep appreciation of the social, cultural, and ethical priorities and implications of the technological solutions engineers are tasked with designing and deploying,” University of Michigan nuclear engineer Aditi Verma and coauthors Katie Snyder and Shanna Daly asked their first-year engineering students to codesign nuclear power plants in collaboration with local community members. Although traditional nuclear engineering classes avoid “getting messy,” Verma and colleagues wanted students to engage honestly with the uncertainties of the profession. In the process of working with communities, the students’ vocabulary changed; they spoke of trust, respect, and “love” for community—even when considering deep geological waste repositories…(More)”.

Governments Empower Citizens by Promoting Digital Rights


Article by Julia Edinger: “The rapid rise of digital services and smart city technology has elevated concerns about privacy in the digital age and government’s role, even as cities from California to Texas take steps to make constituents aware of their digital rights.

Earlier this month, Long Beach, Calif., launched an improved version of its Digital Rights Platform, which shows constituents their data privacy and digital rights and information about how the city uses technologies while protecting digital rights.

“People’s digital rights are no different from their human or civil rights, except that they’re applied to how they interact with digital technologies — when you’re online, you’re still entitled to every right you enjoy offline,” said Will Greenberg, staff technologist at the Electronic Frontier Foundation (EFF), in a written statement. The nonprofit organization defends civil liberties in the digital world.


Long Beach’s platform initially launched several years ago, to mitigate privacy concerns that came out of the 2020 launch of a smart city initiative, according to Long Beach CIO Lea Eriksen. When that initiative debuted, the Department of Innovation and Technology requested the City Council approve a set of data privacy guidelines to ensure digital rights would be protected, setting the stage for the initial platform launch. Its 2021 beta version has now been enhanced to offer information on 22 city technology uses, up from two, and an enhanced feedback module enabling continued engagement and platform improvements…(More)”.

Is peer review failing its peer review?


Article by First Principles: “Ivan Oransky doesn’t sugar-coat his answer when asked about the state of academic peer review: “Things are pretty bad.”

As a distinguished journalist in residence at New York University and co-founder of Retraction Watch – a site that chronicles the growing number of papers being retracted from academic journals – Oransky is better positioned than just about anyone to make such a blunt assessment. 

He elaborates further, citing a range of factors contributing to the current state of affairs. These include the publish-or-perish mentality, chatbot ghostwriting, predatory journals, plagiarism, an overload of papers, a shortage of reviewers, and weak incentives to attract and retain reviewers.

“Things are pretty bad and they have been bad for some time because the incentives are completely misaligned,” Oranksy told FirstPrinciples in a call from his NYU office. 

Things are so bad that a new world record was set in 2023: more than 10,000 research papers were retracted from academic journals. In a troubling development, 19 journals closed after being inundated by a barrage of fake research from so-called “paper mills” that churn out the scientific equivalent of clickbait, and one scientist holds the current record of 213 retractions to his name. 

“The numbers don’t lie: Scientific publishing has a problem, and it’s getting worse,” Oransky and Retraction Watch co-founder Adam Marcus wrote in a recent opinion piece for The Washington Post. “Vigilance against fraudulent or defective research has always been necessary, but in recent years the sheer amount of suspect material has threatened to overwhelm publishers.”..(More)”.

The problem of ‘model collapse’: how a lack of human data limits AI progress


Article by Michael Peel: “The use of computer-generated data to train artificial intelligence models risks causing them to produce nonsensical results, according to new research that highlights looming challenges to the emerging technology. 

Leading AI companies, including OpenAI and Microsoft, have tested the use of “synthetic” data — information created by AI systems to then also train large language models (LLMs) — as they reach the limits of human-made material that can improve the cutting-edge technology.

Research published in Nature on Wednesday suggests the use of such data could lead to the rapid degradation of AI models. One trial using synthetic input text about medieval architecture descended into a discussion of jackrabbits after fewer than 10 generations of output. 

The work underlines why AI developers have hurried to buy troves of human-generated data for training — and raises questions of what will happen once those finite sources are exhausted. 

“Synthetic data is amazing if we manage to make it work,” said Ilia Shumailov, lead author of the research. “But what we are saying is that our current synthetic data is probably erroneous in some ways. The most surprising thing is how quickly this stuff happens.”

The paper explores the tendency of AI models to collapse over time because of the inevitable accumulation and amplification of mistakes from successive generations of training.

The speed of the deterioration is related to the severity of shortcomings in the design of the model, the learning process and the quality of data used. 

The early stages of collapse typically involve a “loss of variance”, which means majority subpopulations in the data become progressively over-represented at the expense of minority groups. In late-stage collapse, all parts of the data may descend into gibberish…(More)”.

Citizens should be asked to do more


Article by Martin Wolf: “In an excellent “Citizens’ White Paper”, in partnership with participation charity Involve, Demos describes the needed revolution as follows, “We don’t just need new policies for these challenging times. We need new ways to tackle the policy challenges we face — from national missions to everyday policymaking. We need new ways to understand and negotiate what the public will tolerate. We need new ways to build back trust in politicians”. In sum, it states, “if government wants to be trusted by the people, it must itself start to trust the people.”

Bar chart of agreement that public should be involved in decision making on these issues (%) showing the public has clear ideas on where it should be most involved

The fundamental aim is to change the perception of government from something that politicians and bureaucrats do to us into an activity that involves not everyone, which is impossible, but ordinary people selected by lot. This, as I have noted, would be the principle of the jury imported into public life.

How might this work? The idea is to select representative groups of ordinary people affected by policies into official discussion on problems and solutions. This could be at the level of central, devolved or local government. The participants would not just be asked for opinions, but be actively engaged in considering issues and shaping (though not making) decisions upon them. The paper details a number of different approaches — panels, assemblies, juries, workshops and wider community conversations. Which would be appropriate would depend on the task…(More)”.

Illuminating ‘the ugly side of science’: fresh incentives for reporting negative results


Article by Rachel Brazil: “Editor-in-chief Sarahanne Field describes herself and her team at the Journal of Trial & Error as wanting to highlight the “ugly side of science — the parts of the process that have gone wrong”.

She clarifies that the editorial board of the journal, which launched in 2020, isn’t interested in papers in which “you did a shitty study and you found nothing. We’re interested in stuff that was done methodologically soundly, but still yielded a result that was unexpected.” These types of result — which do not prove a hypothesis or could yield unexplained outcomes — often simply go unpublished, explains Field, who is also an open-science researcher at the University of Groningen in the Netherlands. Along with Stefan Gaillard, one of the journal’s founders, she hopes to change that.

Calls for researchers to publish failed studies are not new. The ‘file-drawer problem’ — the stacks of unpublished, negative results that most researchers accumulate — was first described in 1979 by psychologist Robert Rosenthal. He argued that this leads to publication bias in the scientific record: the gap of missing unsuccessful results leads to overemphasis on the positive results that do get published…(More)”.

When A.I. Fails the Language Test, Who Is Left Out of the Conversation?


Article by Sara Ruberg: “While the use of A.I. has exploded in the West, much of the rest of the world has been left out of the conversation since most of the technology is trained in English. A.I. experts worry that the language gap could exacerbate technological inequities, and that it could leave many regions and cultures behind.

A delay of access to good technology of even a few years, “can potentially lead to a few decades of economic delay,” said Sang Truong, a Ph.D. candidate at the Stanford Artificial Intelligence Laboratory at Stanford University on the team that built and tested a Vietnamese language model against others.

The tests his team ran found that A.I. tools across the board could get facts and diction wrong when working with Vietnamese, likely because it is a “low-resource” language by industry standards, which means that there aren’t sufficient data sets and content available online for the A.I. model to learn from.

Low-resource languages are spoken by tens and sometimes hundreds of millions of people around the world, but they yield less digital data because A.I. tech development and online engagement is centered in the United States and China. Other low-resource languages include Hindi, Bengali and Swahili, as well as lesser-known dialects spoken by smaller populations around the world.

An analysis of top websites by W3Techs, a tech survey company, found that English makes up over 60 percent of the internet’s language data. While English is widely spoken globally, native English speakers make up about 5 percent of the population, according to Ethnologue, a research organization that collects language data. Mandarin and Spanish are other examples of languages with a significant online presence and reliable digital data sets.

Academic institutions, grass-roots organizations and volunteer efforts are playing catch-up to build resources for speakers of languages who aren’t as well represented in the digital landscape.

Lelapa AI, based in Johannesburg, is one such company leading efforts on the African continent. The South African-based start-up is developing multilingual A.I. products for people and businesses in Africa…(More)”.

AI firms will soon exhaust most of the internet’s data


Article by The Economist: “One approach is to focus on data quality rather than quantity. ai labs do not simply train their models on the entire internet. They filter and sequence data to maximise how much their models learn. Naveen Rao of Databricks, an ai firm, says that this is the “main differentiator” between ai models on the market. “True information” about the world obviously matters; so does lots of “reasoning”. That makes academic textbooks, for example, especially valuable. But setting the balance between data sources remains something of a dark art. What is more, the ordering in which the system encounters different types of data matters too. Lump all the data on one topic, like maths, at the end of the training process, and your model may become specialised at maths but forget some other concepts.

These considerations can get even more complex when the data are not just on different subjects but in different forms. In part because of the lack of new textual data, leading models like Openai’s gpt-4o and Google’s Gemini are now let loose on image, video and audio files as well as text during their self-supervised learning. Training on video is hardest given how dense with data points video files are. Current models typically look at a subset of frames to simplify things.

Whatever models are used, ownership is increasingly recognised as an issue. The material used in training llms is often copyrighted and used without consent from, or payment to, the rights holders. Some ai models peep behind paywalls. Model creators claim this sort of thing falls under the “fair use” exemption in American copyright law. ai models should be allowed to read copyrighted material when they learn, just as humans can, they say. But as Benedict Evans, a technology analyst, has put it, “a difference in scale” can lead to “a difference in principle”…

It is clear that access to more data—whether culled from specialist sources, generated synthetically or provided by human experts—is key to maintaining rapid progress in ai. Like oilfields, the most accessible data reserves have been depleted. The challenge now is to find new ones—or sustainable alternatives…(More)”.

The Risks of Empowering “Citizen Data Scientists”


Article by Reid Blackman and Tamara Sipes: “Until recently, the prevailing understanding of artificial intelligence (AI) and its subset machine learning (ML) was that expert data scientists and AI engineers were the only people that could push AI strategy and implementation forward. That was a reasonable view. After all, data science generally, and AI in particular, is a technical field requiring, among other things, expertise that requires many years of education and training to obtain.

Fast forward to today, however, and the conventional wisdom is rapidly changing. The advent of “auto-ML” — software that provides methods and processes for creating machine learning code — has led to calls to “democratize” data science and AI. The idea is that these tools enable organizations to invite and leverage non-data scientists — say, domain data experts, team members very familiar with the business processes, or heads of various business units — to propel their AI efforts.

In theory, making data science and AI more accessible to non-data scientists (including technologists who are not data scientists) can make a lot of business sense. Centralized and siloed data science units can fail to appreciate the vast array of data the organization has and the business problems that it can solve, particularly with multinational organizations with hundreds or thousands of business units distributed across several continents. Moreover, those in the weeds of business units know the data they have, the problems they’re trying to solve, and can, with training, see how that data can be leveraged to solve those problems. The opportunities are significant.

In short, with great business insight, augmented with auto-ML, can come great analytic responsibility. At the same time, we cannot forget that data science and AI are, in fact, very difficult, and there’s a very long journey from having data to solving a problem. In this article, we’ll lay out the pros and cons of integrating citizen data scientists into your AI strategy and suggest methods for optimizing success and minimizing risks…(More)”.