Article by Shayne Longpre et al: “New AI capabilities are owed in large part to massive, widely sourced, and underdocumented training data collections. Dubious collection practices have spurred crises in data transparency, authenticity, consent, privacy, representation, bias, copyright infringement, and the overall development of ethical and trustworthy AI systems. In response, AI regulation is emphasizing the need for training data transparency to understand AI model limitations. Based on a large-scale analysis of the AI training data landscape and existing solutions, we identify the missing infrastructure to facilitate responsible AI development practices. We explain why existing tools for data authenticity, consent, and documentation alone are unable to solve the core problems facing the AI community, and outline how policymakers, developers, and data creators can facilitate responsible AI development, through universal data provenance standards…(More)”.
Data Authenticity, Consent, and Provenance for AI Are All Broken: What Will It Take to Fix Them?
How to contribute:
Did you come across – or create – a compelling project/report/book/app at the leading edge of innovation in governance?
Share it with us at info@thelivinglib.org so that we can add it to the Collection!
About the Curator
Get the latest news right in you inbox
Subscribe to curated findings and actionable knowledge from The Living Library, delivered to your inbox every Friday
Related articles
artificial intelligence
How People Use ChatGPT
Posted in September 17, 2025 by Stefaan Verhulst
artificial intelligence, DATA
We Tested AI Impact Assessments. Here’s What We Learned.
Posted in September 16, 2025 by Stefaan Verhulst
artificial intelligence, DATA
FanFAIR: sensitive data sets semi-automatic fairness assessment
Posted in September 14, 2025 by Stefaan Verhulst