Article by Kevin T. Frazier: “Imagine receiving this email in the near future: “Thank you for sharing data with the American Data Collective on May 22, 2025. After first sharing your workout data with SprintAI, a local startup focused on designing shoes for differently abled athletes, your data donation was also sent to an artificial intelligence research cluster hosted by a regional university. Your donation is on its way to accelerate artificial intelligence innovation and support researchers and innovators addressing pressing public needs!”
That is exactly the sort of message you could expect to receive if we made donations of personal data akin to blood donations—a pro-social behavior that may not immediately serve a donor’s individual needs but may nevertheless benefit the whole of the community. This vision of a future where data flow toward the public good is not science fiction—it is a tangible possibility if we address a critical bottleneck faced by innovators today.
Creating the data equivalent of blood banks may not seem like a pressing need or something that people should voluntarily contribute to, given widespread concerns about a few large artificial intelligence (AI) companies using data for profit-driven and, arguably, socially harmful ends. This narrow conception of the AI ecosystem fails to consider the hundreds of AI research initiatives and startups that have a desperate need for high-quality data. I was fortunate enough to meet leaders of those nascent AI efforts at Meta’s Open Source AI Summit in Austin, Texas. For example, I met with Matt Schwartz, who leads a startup that leans on AI to glean more diagnostic information from colonoscopies. I also connected with Edward Chang, a professor of neurological surgery at the University of California, San Francisco Weill Institute for Neurosciences, who relies on AI tools to discover new information on how and why our brains work. I also got to know Corin Wagen, whose startup is helping companies “find better molecules faster.” This is a small sample of the people leveraging AI for objectively good outcomes. They need your help. More specifically, they need your data.
A tragic irony shapes our current data infrastructure. Most of us share mountains of data with massive and profitable private parties—smartwatch companies, diet apps, game developers, and social media companies. Yet, AI labs, academic researchers, and public interest organizations best positioned to leverage our data for the common good are often those facing the most formidable barriers to acquiring the necessary quantity, quality, and diversity of data. Unlike OpenAI, they are not going to use bots to scrape the internet for data. Unlike Google and Meta, they cannot rely on their own social media platforms and search engines to act as perpetual data generators. And, unlike Anthropic, they lack the funds to license data from media outlets. So, while commercial entities amass vast datasets, frequently as a byproduct of consumer services and proprietary data acquisition strategies, mission-driven AI initiatives dedicated to public problems find themselves in a state of chronic data scarcity. This is not merely a hurdle—it is a systemic bottleneck choking off innovation where society needs it most, delaying or even preventing the development of AI tools that could significantly improve lives.
Individuals are, quite rightly, increasingly hesitant to share their personal information, with concerns about privacy, security, and potential misuse being both rampant and frequently justified by past breaches and opaque practices. Yet, in a striking contradiction, troves of deeply personal data are continuously siphoned by app developers, by tech platforms, and, often opaquely, by an extensive network of data brokers. This practice often occurs with minimal transparency and without informed consent concerning the full lifecycle and downstream uses of that data. This lack of transparency extends to how algorithms trained on this data make decisions that can impact individuals’ lives—from loan applications to job prospects—often without clear avenues for recourse or understanding, potentially perpetuating existing societal biases embedded in historical data…(More)”.