“Data Commons”: Under Threat by or The Solution for a Generative AI Era ? Rethinking Data Access and Re-us


Article by Stefaan G. Verhulst, Hannah Chafetz and Andrew Zahuranec: “One of the great paradoxes of our datafied era is that we live amid both unprecedented abundance and scarcity. Even as data grows more central to our ability to promote the public good, so too does it remain deeply — and perhaps increasingly — inaccessible and privately controlled. In response, there have been growing calls for “data commons” — pools of data that would be (self-)managed by distinctive communities or entities operating in the public’s interest. These pools could then be made accessible and reused for the common good.

Data commons are typically the results of collaborative and participatory approaches to data governance [1]. They offer an alternative to the growing tendency toward privatized data silos or extractive re-use of open data sets, instead emphasizing the communal and shared value of data — for example, by making data resources accessible in an ethical and sustainable way for purposes in alignment with community values or interests such as scientific researchsocial good initiativesenvironmental monitoringpublic health, and other domains.

Data commons can today be considered (the missing) critical infrastructure for leveraging data to advance societal wellbeing. When designed responsibly, they offer potential solutions for a variety of wicked problems, from climate change to pandemics and economic and social inequities. However, the rapid ascent of generative artificial intelligence (AI) technologies is changing the rules of the game, leading both to new opportunities as well as significant challenges for these communal data repositories.

On the one hand, generative AI has the potential to unlock new insights from data for a broader audience (through conversational interfaces such as chats), fostering innovation, and streamlining decision-making to serve the public interest. Generative AI also stands out in the realm of data governance due to its ability to reuse data at a massive scale, which has been a persistent challenge in many open data initiatives. On the other hand, generative AI raises uncomfortable questions related to equitable accesssustainability, and the ethical re-use of shared data resources. Further, without the right guardrailsfunding models and enabling governance frameworks, data commons risk becoming data graveyards — vast repositories of unused, and largely unusable, data.

Ten part framework to rethink Data Commons

In what follows, we lay out some of the challenges and opportunities posed by generative AI for data commons. We then turn to a ten-part framework to set the stage for a broader exploration on how to reimagine and reinvigorate data commons for the generative AI era. This framework establishes a landscape for further investigation; our goal is not so much to define what an updated data commons would look like but to lay out pathways that would lead to a more meaningful assessment of the design requirements for resilient data commons in the age of generative AI…(More)”