artificial intelligence

APIs for news

Article by Jeff Jarvis: “News organizations will face an AI reckoning in 2026 and a choice: They can keep blocking AI crawlers, suing AI companies, and lobbying for protectionist AI legislation — all the while making themselves invisible to the publics they serve on the next medium that matters — or they can figure out how to play along.

Unsurprisingly, I hold a number of likely unpopular opinions on the matter:

Journalists must address their civic, professional, even moral obligation to provide news, reporting, and information via AI. For — like it or not — AI is where more and more people will go for information. It is clear that competitors for attention — marketing and misinformation — are rushing to be included in the training and output of large language models. This study finds that “reputable sites forbid an average of 15.5 AI user agents, while misinformation sites prohibit fewer than one.” By blocking AI, journalism is abdicating control of society’s information ecosystem to pitchmen and propagandists.
AI no longer needs news. Major models are already trained and in the future will be trained with synthetic data. Next frontiers in AI development — see, for example, the work of Yann LeCun — will be built on world models and experience, not text and content.
Anyway, training models is fair use and transformative. This debate will not be fully adjudicated for some time, but the Anthropic decision makes it clear that media’s copyright fight against training is a tenuous strategy. Note well that the used books Anthropic legitimately acquired yielded no payment to authors or publishers, and if Anthropic had only bought one copy of each title in the purloined databases, it would not have been found liable and authors would have netted the royalties on just one book each.
AI is the new means of discovery online. I had a conversation with a news executive recently who, in one breath, boasted of cutting off all the AI bots save one (Google’s), and in the next asked how his sites will be discovered online. The answer: AI. Rich Skrenta, executive director of the Common Crawl Foundation, writes that if media brands block crawlers, AI models will not know to search for them, quote them, or link to them when users ask relevant questions. He advises publishers to replace SEO with AIO: optimization for AI. Ah, but you say, AI doesn’t link. No. This study compared the links in AI against search and found that ChatGPT displayed “a systemic and overwhelming bias towards Earned media (third-party, authoritative sources) over Brand owned and Social content, a stark contrast to Google’s more balanced mix.” The links are there. Whether users click on them is, as ever, another question. But if your links aren’t there, no one will click anyway…(More)”.