artificial intelligence

‘Not for Machines to Harvest’: Data Revolts Break Out Against A.I.

Article by Sheera Frenkel, and Stuart A. Thompson: “Fan fiction writers are just one group now staging revolts against A.I. systems as a fever over the technology has gripped Silicon Valley and the world. In recent months, social media companies such as Reddit and Twitter, news organizations including The New York Times and NBC News, authors such as Paul Tremblay and the actress Sarah Silverman have all taken a position against A.I. sucking up their data without permission.

Their protests have taken different forms. Writers and artists are locking their files to protect their work or are boycotting certain websites that publish A.I.-generated content, while companies like Reddit want to charge for access to their data. At least 10 lawsuits have been filed this year against A.I. companies, accusing them of training their systems on artists’ creative work without consent. This past week, Ms. Silverman and the authors Christopher Golden and Richard Kadrey sued OpenAI, the maker of ChatGPT, and others over A.I.’s use of their work.

At the heart of the rebellions is a newfound understanding that online information — stories, artwork, news articles, message board posts and photos — may have significant untapped value.

The new wave of A.I. — known as “generative A.I.” for the text, images and other content it generates — is built atop complex systems such as large language models, which are capable of producing humanlike prose. These models are trained on hoards of all kinds of data so they can answer people’s questions, mimic writing styles or churn out comedy and poetry.

That has set off a hunt by tech companies for even more data to feed their A.I. systems. Google, Meta and OpenAI have essentially used information from all over the internet, including large databases of fan fiction, troves of news articles and collections of books, much of which was available free online. In tech industry parlance, this was known as “scraping” the internet…(More)”.