Article by Stefaan Verhulst: “At the turn of the 20th century, Andrew Carnegie was one of the richest men in the world. He was also one of the most reviled, infamous for the harsh labor conditions and occasional violence at his steel mills. Determined to rehabilitate his reputation, Carnegie embarked upon a number of ambitious philanthropic ventures that would redefine his legacy, and leave a lasting impact on the United States and the world.
Among the most ambitious of these were the Carnegie Libraries. Between 1860 and 1930, Carnegie spent almost $60 million (equivalent to around $2.3 billion today), to build a network of 2,509 libraries globally — 1,689 in the United States and the rest in places as diverse as Australia, Fiji, South Africa, and his native Scotland. Carnegie supported these libraries for a number of reasons: to burnish his own reputation, because he thought it would help support immigrant integration into the US, but most of all because he was “dedicated to the diffusion of knowledge.” For Carnegie, greater knowledge was key to fostering all manner of social goods — everything from a healthier democracy to more innovation and better health. Today, many of those libraries still stand in communities across the country, a testament to the lasting impact of Carnegie’s generosity.
The story of Carnegie’s libraries would seem to offer a happy story from the past, a quaint period piece. But it has resonance in the present.
Today, we are once again presented with a landscape in which information is both abundant and scarce, offering tremendous potential for the public good yet largely accessible and reusable only to a small (corporate) minority. This paradox stems from the fact that while more and more aspects of our lives are captured in digital form, the resulting data is increasingly locked away, or inaccessible.
The centrality of data to public life is now undeniable, particularly with the rise of generative artificial intelligence, which relies on vast troves of high-quality, diverse, and timely datasets. Yet access to such data is being steadily eroded as governments, corporations, and institutions impose new restrictions on what can be accessed and reused. In some cases, open data portals and official statistics once celebrated as milestones of transparency have been defunded or scaled back, with fewer datasets published and those that remain limited to low-risk, non-sensitive material. At the same time, private platforms that once offered public APIs for research — such as Twitter (now X), Meta and Reddit — have closed or heavily monetized access, cutting off academics, civil society groups, and smaller enterprises from vital resources.
The drivers of this shift are varied but interlinked. The rise of generative AI has triggered what some call “generative AI-nxiety,” prompting news organizations, academic institutions, and other data custodians to block crawlers and restrict even non-sensitive repositories, often in (understandable) reaction to unconsented scraping for commercial model training. This is compounded by a broader research data lockdown, in which critical resources such as social media datasets used to study misinformation, political discourse, or mental health, and open environmental data essential for climate modeling, are increasingly subject to paywalls, restrictive licensing, or geopolitical disputes.
Rising calls for digital sovereignty have also led to a proliferation of data localization laws that prevent cross-border flows, undermining collaborative efforts on urgent global challenges like pandemic preparedness, disaster response, and environmental monitoring. Meanwhile, in the private sector, data is increasingly treated as a proprietary asset to be hoarded or sold, rather than a shared resource that can be stewarded responsibly for mutual benefit.
Indeed, we may be entering a new “data winter,” one marked by the emergence of new silos and gatekeepers and by a relentless — and socially corrosive — erosion of the open, interoperable data infrastructures that once seemed to hold so much promise.
This narrowing of the data commons comes precisely at a moment when global challenges demand greater openness, collaboration, and trust. Left unchecked, it risks stalling scientific breakthroughs, weakening evidence-based policymaking, deepening inequities in access to knowledge, and entrenching power in the hands of a few large actors, reshaping not only innovation but our collective capacity to understand and respond to the world.
A Carnegie commitment to the “diffusion of knowledge”, updated for the digital age, can help avert this dire situation. Building modern data libraries, embedding principles of the commons, could restore openness while safeguarding privacy and security. Without such action, the promise of our data-rich era may curdle into a new form of information scarcity, with deep and lasting societal costs…(More)”.