Blueprint by Hannah Chafetz, Andrew J. Zahuranec, and Stefaan Verhulst: “In today’s rapidly evolving AI landscape, it is critical to broaden access to diverse and high-quality data to ensure that AI applications can serve all communities equitably. Yet, we are on the brink of a potential “data winter,” where valuable data assets that could drive public good are increasingly locked away or inaccessible.
Data commons — collaboratively governed ecosystems that enable responsible sharing of diverse datasets across sectors — offer a promising solution. By pooling data under clear standards and shared governance, data commons can unlock the potential of AI for public benefit while ensuring that its development reflects the diversity of experiences and needs across society.
To accelerate the creation of data commons, The Open Data Policy, today, releases “A Blueprint to Unlock New Data Commons for AI” — a guide on how to steward data to create data commons that enable public-interest AI use cases…the document is aimed at supporting libraries, universities, research centers, and other data holders (e.g. governments and nonprofits) through four modules:
- Mapping the Demand and Supply: Understanding why AI systems need data, what data can be made available to train, adapt, or augment AI, and what a viable data commons prototype might look like that incorporates stakeholder needs and values;
- Unlocking Participatory Governance: Co-designing key aspects of the data commons with key stakeholders and documenting these aspects within a formal agreement;
- Building the Commons: Establishing the data commons from a practical perspective and ensure all stakeholders are incentivized to implement it; and
- Assessing and Iterating: Evaluating how the commons is working and iterating as needed.
These modules are further supported by two supplementary taxonomies. “The Taxonomy of Data Types” provides a list of data types that can be valuable for public-interest generative AI use cases. The “Taxonomy of Use Cases” outlines public-interest generative AI applications that can be developed using a data commons approach, along with possible outcomes and stakeholders involved.
A separate set of worksheets can be used to further guide organizations in deploying these tools…(More)”.