Essay by Joshua Rothman: “Last spring, Daniel Kokotajlo, an A.I.-safety researcher working at OpenAI, quit his job in protest. He’d become convinced that the company wasn’t prepared for the future of its own technology, and wanted to sound the alarm. After a mutual friend connected us, we spoke on the phone. I found Kokotajlo affable, informed, and anxious. Advances in “alignment,” he told me—the suite of techniques used to insure that A.I. acts in accordance with human commands and values—were lagging behind gains in intelligence. Researchers, he said, were hurtling toward the creation of powerful systems they couldn’t control.
Kokotajlo, who had transitioned from a graduate program in philosophy to a career in A.I., explained how he’d educated himself so that he could understand the field. While at OpenAI, part of his job had been to track progress in A.I. so that he could construct timelines predicting when various thresholds of intelligence might be crossed. At one point, after the technology advanced unexpectedly, he’d had to shift his timelines up by decades. In 2021, he’d written a scenario about A.I. titled “What 2026 Looks Like.” Much of what he’d predicted had come to pass before the titular year. He’d concluded that a point of no return, when A.I. might become better than people at almost all important tasks, and be trusted with great power and authority, could arrive in 2027 or sooner. He sounded scared.
Around the same time that Kokotajlo left OpenAI, two computer scientists at Princeton, Sayash Kapoor and Arvind Narayanan, were preparing for the publication of their book, “AI Snake Oil: What Artificial Intelligence Can Do, What It Can’t, and How to Tell the Difference.” In it, Kapoor and Narayanan, who study technology’s integration with society, advanced views that were diametrically opposed to Kokotajlo’s. They argued that many timelines of A.I.’s future were wildly optimistic; that claims about its usefulness were often exaggerated or outright fraudulent; and that, because of the world’s inherent complexity, even powerful A.I. would change it only slowly. They cited many cases in which A.I. systems had been called upon to deliver important judgments—about medical diagnoses, or hiring—and had made rookie mistakes that indicated a fundamental disconnect from reality. The newest systems, they maintained, suffered from the same flaw.Recently, all three researchers have sharpened their views, releasing reports that take their analyses further. The nonprofit AI Futures Project, of which Kokotajlo is the executive director, has published “AI 2027,” a heavily footnoted document, written by Kokotajlo and four other researchers, which works out a chilling scenario in which “superintelligent” A.I. systems either dominate or exterminate the human race by 2030. It’s meant to be taken seriously, as a warning about what might really happen. Meanwhile, Kapoor and Narayanan, in a new paper titled “AI as Normal Technology,” insist that practical obstacles of all kinds—from regulations and professional standards to the simple difficulty of doing physical things in the real world—will slow A.I.’s deployment and limit its transformational potential. While conceding that A.I. may eventually turn out to be a revolutionary technology, on the scale of electricity or the internet, they maintain that it will remain “normal”—that is, controllable through familiar safety measures, such as fail-safes, kill switches, and human supervision—for the foreseeable future. “AI is often analogized to nuclear weapons,” they argue. But “the right analogy is nuclear power,” which has remained mostly manageable and, if anything, may be underutilized for safety reasons.