Article by Billy Perrigo/Karnataka: “…To create an effective English-speaking AI, it is enough to simply collect data from where it has already accumulated. But for languages like Kannada, you need to go out and find more.
This has created huge demand for datasets—collections of text or voice data—in languages spoken by some of the poorest people in the world. Part of that demand comes from tech companies seeking to build out their AI tools. Another big chunk comes from academia and governments, especially in India, where English and Hindi have long held outsize precedence in a nation of some 1.4 billion people with 22 official languages and at least 780 more indigenous ones. This rising demand means that hundreds of millions of Indians are suddenly in control of a scarce and newly-valuable asset: their mother tongue.
Data work—creating or refining the raw material at the heart of AI— is not new in India. The economy that did so much to turn call centers and garment factories into engines of productivity at the end of the 20th century has quietly been doing the same with data work in the 21st. And, like its predecessors, the industry is once again dominated by labor arbitrage companies, which pay wages close to the legal minimum even as they sell data to foreign clients for a hefty mark-up. The AI data sector, worth over $2 billion globally in 2022, is projected to rise in value to $17 billion by 2030. Little of that money has flowed down to data workers in India, Kenya, and the Philippines.
These conditions may cause harms far beyond the lives of individual workers. “We’re talking about systems that are impacting our whole society, and workers who make those systems more reliable and less biased,” says Jonas Valente, an expert in digital work platforms at Oxford University’s Internet Institute. “If you have workers with basic rights who are more empowered, I believe that the outcome—the technological system—will have a better quality as well.”
In the neighboring villages of Alahalli and Chilukavadi, one Indian startup is testing a new model. Chandrika works for Karya, a nonprofit launched in 2021 in Bengaluru (formerly Bangalore) that bills itself as “the world’s first ethical data company.” Like its competitors, it sells data to big tech companies and other clients at the market rate. But instead of keeping much of that cash as profit, it covers its costs and funnels the rest toward the rural poor in India. (Karya partners with local NGOs to ensure access to its jobs go first to the poorest of the poor, as well as historically marginalized communities.) In addition to its $5 hourly minimum, Karya gives workers de-facto ownership of the data they create on the job, so whenever it is resold, the workers receive the proceeds on top of their past wages. It’s a model that doesn’t exist anywhere else in the industry…(More)”.