New AI standards group wants to make data scraping opt-in


Article by Kate Knibbs: “The first wave of major generative AI tools largely were trained on “publicly available” data—basically, anything and everything that could be scraped from the Internet. Now, sources of training data are increasingly restricting access and pushing for licensing agreements. With the hunt for additional data sources intensifying, new licensing startups have emerged to keep the source material flowing.

The Dataset Providers Alliance, a trade group formed this summer, wants to make the AI industry more standardized and fair. To that end, it has just released a position paper outlining its stances on major AI-related issues. The alliance is made up of seven AI licensing companies, including music copyright-management firm Rightsify, Japanese stock-photo marketplace Pixta, and generative-AI copyright-licensing startup Calliope Networks. (At least five new members will be announced in the fall.)

The DPA advocates for an opt-in system, meaning that data can be used only after consent is explicitly given by creators and rights holders. This represents a significant departure from the way most major AI companies operate. Some have developed their own opt-out systems, which put the burden on data owners to pull their work on a case-by-case basis. Others offer no opt-outs whatsoever…(More)”.