New Horizons


An Introduction to the 2nd Edition of the State of Open Data by Renata Avila and Tim Davies: “The struggle to deliver on the vision that data, this critical resource of modern societies, should be widely available, well structured, and shared for all to use, has been a long one. It has been a struggle involving thousands upon thousands of individuals, organisations, and communities. Without their efforts, public procurement would be opaque, smart-cities even more corporate controlled, transport systems less integrated, and pandemic responses less rapid. Across numerous initiatives, open data has become more embedded as a way to support accountability, enable collaboration, and to better unlock the value of data. 

However, much like the climber reaching the top of the foothills, and for the first time seeing the hard climb of the whole mountain coming into view, open data advocates, architects, and community builders have not reached the end of their journey. As we move into the middle of the 2020s, action on open data faces new and significant challenges if we are to see a future in which open and enabling data infrastructures and ecosystems are the norm rather than a sparse patchwork of exceptions. Building open infrastructures to power social change for the next century is no small task, and to meet the challenges ahead, we will need all that the lessons we can gather from more than 15 years of open data action to date…Across the collection, we can find two main pathways to broader participation explored. On the one hand are discussions of widening public engagement and data literacy, creating a more diverse constituency of people interested and able to engage with data projects in a voluntary capacity. On the other, are calls for more formalisation of data governance, embedding citizen voices within increasingly structured data collaborations and ensuring that affected stakeholders are consulted on, or given a role in, key data decisions. Mariel García-Montes (Data Literacy) underscores the case for an equity-first approach to the first pathway, highlighting how generalist data literacy can be used for or against the public good, and calling for approaches to data literacy building that centre on an understanding of inequality and power. In writing on urban development, Stefaan G. Verhulst and Sampriti Saxena (Urban Development) point to a number of examples of the latter approach in which cities are experimenting with various forms of deliberative conversations and processes…(More)”.

A Plan to Develop Open Science’s Green Shoots into a Thriving Garden


Article by Greg Tananbaum, Chelle Gentemann, Kamran Naim, and Christopher Steven Marcum: “…As it’s moved from an abstract set of principles about access to research and data into the realm of real-world activities, the open science movement has mirrored some of the characteristics of the open source movement: distributed, independent, with loosely coordinated actions happening in different places at different levels. Globally, many things are happening, often disconnected, but still interrelated: open science has sowed a constellation of thriving green shoots, not quite yet a garden, but all growing rapidly on arable soil.

Streamlining research processes, reducing duplication of efforts, and accelerating scientific discoveries could ensure that the fruits of open science processes and products are more accessible and equitably distributed.

It is now time to consider how much faster and farther the open science movement could go with more coordination. What efficiencies might be realized if disparate efforts could better harmonize across geographies, disciplines, and sectors? How would an intentional, systems-level approach to aligning incentives, infrastructure, training, and other key components of a rationally functioning research ecosystem advance the wider goals of the movement? Streamlining research processes, reducing duplication of efforts, and accelerating scientific discoveries could ensure that the fruits of open science processes and products are more accessible and equitably distributed…(More)”

Outpacing Pandemics: Solving the First and Last Mile Challenges of Data-Driven Policy Making


Article by Stefaan Verhulst, Daniela Paolotti, Ciro Cattuto, and Alessandro Vespignani: “As society continues to emerge from the legacy of COVID-19, a dangerous complacency seems to be setting in. Amidst recurrent surges of cases, each serving as a reminder of the virus’s persistence, there is a noticeable decline in collective urgency to prepare for future pandemics. This situation represents not just a lapse in memory but a significant shortfall in our approach to pandemic preparedness. It dramatically underscores the urgent need to develop novel and sustainable approaches and responses and to reinvent how we approach public health emergencies.

Among the many lessons learned from previous infectious disease outbreaks, the potential and utility of data, and particularly non-traditional forms of data, are surely among the most important lessons. Among other benefits, data has proven useful in providing intelligence and situational awareness in early stages of outbreaks, empowering citizens to protect their health and the health of vulnerable community members, advancing compliance with non-pharmaceutical interventions to mitigate societal impacts, tracking vaccination rates and the availability of treatment, and more. A variety of research now highlights the particular role played by open source data (and other non-traditional forms of data) in these initiatives.

Although multiple data sources are useful at various stages of outbreaks, we focus on two critical stages proven to be especially challenging: what we call the first mile and the last mile.

We argue that focusing on these two stages (or chokepoints) can help pandemic responses and rationalize resources. In particular, we highlight the role of Data Stewards at both stages and in overall pandemic response effectiveness…(More)”.

Future-Proofing Transparency: Re-Thinking Public Record Governance For the Age of Big Data


Paper by Beatriz Botero Arcila: “Public records, public deeds, and even open data portals often include personal information that can now be easily accessed online. Yet, for all the recent attention given to informational privacy and data protection, scant literature exists on the governance of personal information that is available in public documents. This Article examines the critical issue of balancing privacy and transparency within public record governance in the age of Big Data.

With Big Data and powerful machine learning algorithms, personal information in public records can easily be used to infer sensitive data about people or aggregated to create a comprehensive personal profile of almost anyone. This information is public and open, however, for many good reasons: ensuring political accountability, facilitating democratic participation, enabling economic transactions, combating illegal activities such as money laundering and terrorism financing, and facilitating. Can the interest in record publicity coexist with the growing ease of deanonymizing and revealing sensitive information about individuals?

This Article addresses this question from a comparative perspective, focusing on US and EU access to information law. The Article shows that the publicity of records was, in the past and not withstanding its presumptive public nature, protected because most people would not trouble themselves to go to public offices to review them, and it was practical impossible to aggregate them to draw extensive profiles about people. Drawing from this insight and contemporary debates on data governance, this Article challenges the binary classification of data as either published or not and proposes a risk-based framework that re-insert that natural friction to public record governance by leveraging techno-legal methods in how information is published and accessed…(More)”.

Do disappearing data repositories pose a threat to open science and the scholarly record?


Article by Dorothea Strecker, Heinz Pampel, Rouven Schabinger and Nina Leonie Weisweiler: “Research data repositories, such as Zenodo or the UK Data Archive, are specialised information infrastructures that focus on the curation and dissemination of research data. One of repositories’ main tasks is maintaining their collections long-term, see for example the TRUST Principles, or the requirements of the certification organization CoreTrustSeal. Long-term preservation is also a prerequisite for several data practices that are getting increasing attention, such as data reuse and data citation.

For data to remain usable, the infrastructures that host them also have to be kept operational. However, the long-term operation of research data repositories is challenging, and sometimes, for varying reasons and despite best efforts, they are shut down….

In a recent study we therefore set out to take an infrastructure perspective on the long-term preservation of research data by investigating repositories across disciplines and types that were shut down. We also tried to estimate the impact of repository shutdown on data availability…

We found that repository shutdown was not rare: 6.2% of all repositories listed in re3data were shut down. Since the launch of the registry in 2012, at least one repository has been shut down each year (see Fig.1). The median age of a repository when shutting down was 12 years…(More)”.

People Have a Right to Climate Data


Article by Justin S. Mankin: “As a climate scientist documenting the multi-trillion-dollar price tag of the climate disasters shocking economies and destroying lives, I sometimes field requests from strategic consultantsfinancial investment analysts and reinsurers looking for climate data, analysis and computer code.

Often, they want to chat about my findings or have me draw out the implications for their businesses, like the time a risk analyst from BlackRock, the world’s largest asset manager, asked me to help with research on what the current El Niño, a cyclical climate pattern, means for financial markets.

These requests make sense: People and companies want to adapt to the climate risks they face from global warming. But these inquiries are also part of the wider commodification of climate science. Venture capitalists are injecting hundreds of millions of dollars into climate intelligence as they build out a rapidly growing business of climate analytics — the data, risk models, tailored analyses and insights people and institutions need to understand and respond to climate risks.

I point companies to our freely available data and code at the Dartmouth Climate Modeling and Impacts Group, which I run, but turn down additional requests for customized assessments. I regard climate information as a public good and fear contributing to a world in which information about the unfolding risks of droughts, floods, wildfires, extreme heat and rising seas are hidden behind paywalls. People and companies who can afford private risk assessments will rent, buy and establish homes and businesses in safer places than the billions of others who can’t, compounding disadvantage and leaving the most vulnerable among us exposed.

Despite this, global consultants, climate and agricultural technology start-ups, insurance companies and major financial firms are all racing to meet the ballooning demand for information about climate dangers and how to prepare for them. While a lot of this information is public, it is often voluminous, technical and not particularly useful for people trying to evaluate their personal exposure. Private risk assessments fill that gap — but at a premium. The climate risk analytics market is expected to grow to more than $4 billion globally by 2027.

I don’t mean to suggest that the private sector should not be involved in furnishing climate information. That’s not realistic. But I worry that an overreliance on the private sector to provide climate adaptation information will hollow out publicly provided climate risk science, and that means we all will pay: the well-off with money, the poor with lives…(More)”.

The New Digital Dark Age


Article by Gina Neff: “For researchers, social media has always represented greater access to data, more democratic involvement in knowledge production, and great transparency about social behavior. Getting a sense of what was happening—especially during political crises, major media events, or natural disasters—was as easy as looking around a platform like Twitter or Facebook. In 2024, however, that will no longer be possible.

In 2024, we will face a grim digital dark age, as social media platforms transition away from the logic of Web 2.0 and toward one dictated by AI-generated content. Companies have rushed to incorporate large language models (LLMs) into online services, complete with hallucinations (inaccurate, unjustified responses) and mistakes, which have further fractured our trust in online information.

Another aspect of this new digital dark age comes from not being able to see what others are doing. Twitter once pulsed with publicly readable sentiment of its users. Social researchers loved Twitter data, relying on it because it provided a ready, reasonable approximation of how a significant slice of internet users behaved. However, Elon Musk has now priced researchers out of Twitter data after recently announcing that it was ending free access to the platform’s API. This made it difficult, if not impossible, to obtain data needed for research on topics such as public health, natural disaster response, political campaigning, and economic activity. It was a harsh reminder that the modern internet has never been free or democratic, but instead walled and controlled.

Closer cooperation with platform companies is not the answer. X, for instance, has filed a suit against independent researchers who pointed out the rise in hate speech on the platform. Recently, it has also been revealed that researchers who used Facebook and Instagram’s data to study the platforms’ role in the US 2020 elections had been granted “independence by permission” by Meta. This means that the company chooses which projects to share its data with and, while the research may be independent, Meta also controls what types of questions are asked and who asks them…(More)”.

Toward a Solid Acceptance of the Decentralized Web of Personal Data: Societal and Technological Convergence


Article by Ana Pop Stefanija et al: “Citizens using common online services such as social media, health tracking, or online shopping effectively hand over control of their personal data to the service providers—often large corporations. The services using and processing personal data are also holding the data. This situation is problematic, as has been recognized for some time: competition and innovation are stifled; data is duplicated; and citizens are in a weak position to enforce legal rights such as access, rectification, or erasure. The approach to address this problem has been to ascertain that citizens can access and update, with every possible service provider, the personal data that providers hold of or about them—the foundational view taken in the European General Data Protection Regulation (GDPR).

Recently, however, various societal, technological, and regulatory efforts are taking a very different approach, turning things around. The central tenet of this complementary view is that citizens should regain control of their personal data. Once in control, citizens can decide which providers they want to share data with, and if so, exactly which part of their data. Moreover, they can revisit these decisions anytime…(More)”.

Where Did the Open Access Movement Go Wrong?


An Interview with Richard Poynder by Richard Anderson: “…Open access was intended to solve three problems that have long blighted scholarly communication – the problems of accessibilityaffordability, and equity. 20+ years after the Budapest Open Access Initiative (BOAI) we can see that the movement has signally failed to solve the latter two problems. And with the geopolitical situation deteriorating solving the accessibility problem now also looks to be at risk. The OA dream of “universal open access” remains a dream and seems likely to remain one.

What has been the essence of the OA movement’s failure?

The fundamental problem was that OA advocates did not take ownership of their own movement. They failed, for instance, to establish a central organization (an OA foundation, if you like) in order to organize and better manage the movement; and they failed to publish a single, canonical definition of open access. This is in contrast to the open source movement, and is an omission I drew attention to in 2006

This failure to take ownership saw responsibility for OA pass to organizations whose interests are not necessarily in sync with the objectives of the movement.

It did not help that the BOAI definition failed to specify that to be classified as open access, scholarly works needed to be made freely available immediately on publication and that they should remain freely available in perpetuity. Nor did it give sufficient thought to how OA would be funded (and OA advocates still fail to do that).

This allowed publishers to co-opt OA for their own purposes, most notably by introducing embargoes and developing the pay-to-publish gold OA model, with its now infamous article processing charge (APC).

Pay-to-publish OA is now the dominant form of open access and looks set to increase the cost of scholarly publishing and so worsen the affordability problem. Amongst other things, this has disenfranchised unfunded researchers and those based in the global south (notwithstanding APC waiver promises).

What also did not help is that OA advocates passed responsibility for open access over to universities and funders. This was contradictory, because OA was conceived as something that researchers would opt into. The assumption was that once the benefits of open access were explained to them, researchers would voluntarily embrace it – primarily by self-archiving their research in institutional or preprint repositories. But while many researchers were willing to sign petitions in support of open access, few (outside disciplines like physics) proved willing to practice it voluntarily.

In response to this lack of engagement, OA advocates began to petition universities, funders, and governments to introduce OA policies recommending that researchers make their papers open access. When these policies also failed to have the desired effect, OA advocates demanded their colleagues be forced to make their work OA by means of mandates requiring them to do so.

Most universities and funders (certainly in the global north) responded positively to these calls, in the belief that open access would increase the pace of scientific development and allow them to present themselves as forward-thinking, future-embracing organizations. Essentially, they saw it as a way of improving productivity and ROI while enhancing their public image.

While many researchers were willing to sign petitions in support of open access, few proved willing to practice it voluntarily.

But in light of researchers’ continued reluctance to make their works open access, universities and funders began to introduce increasingly bureaucratic rules, sanctions, and reporting tools to ensure compliance, and to manage the more complex billing arrangements that OA has introduced.

So, what had been conceived as a bottom-up movement founded on principles of voluntarism morphed into a top-down system of command and control, and open access evolved into an oppressive bureaucratic process that has failed to address either the affordability or equity problems. And as the process, and the rules around that process, have become ever more complex and oppressive, researchers have tended to become alienated from open access.

As a side benefit for universities and funders OA has allowed them to better micromanage their faculty and fundees, and to monitor their publishing activities in ways not previously possible. This has served to further proletarianize researchers and today they are becoming the academic equivalent of workers on an assembly line. Philip Mirowski has predicted that open access will lead to the deskilling of academic labor. The arrival of generative AI might seem to make that outcome the more likely…

Can these failures be remedied by means of an OA reset? With this aim in mind (and aware of the failures of the movement), OA advocates are now devoting much of their energy to trying to persuade universities, funders, and philanthropists to invest in a network of alternative nonprofit open infrastructures. They envisage these being publicly owned and focused on facilitating a flowering of new diamond OA journals, preprint servers, and Publish, Review, Curate (PRC) initiatives. In the process, they expect commercial publishers will be marginalized and eventually dislodged.

But it is highly unlikely that the large sums of money that would be needed to create these alternative infrastructures will be forthcoming, certainly not at sufficient levels or on anything other than a temporary basis.

While it is true that more papers and preprints are being published open access each year, I am not convinced this is taking us down the road to universal open access, or that there is a global commitment to open access.

Consequently, I do not believe that a meaningful reset is possible: open access has reached an impasse and there is no obvious way forward that could see the objectives of the OA movement fulfilled.

Partly for this reason, we are seeing attempts to rebrand, reinterpret, and/or reimagine open access and its objectives…(More)”.

Open Data Commons Licences (ODCL): Licensing personal and non personal data supporting the commons and privacy


Paper by Yaniv Benhamou and Melanie Dulong de Rosnay: “Data are often subject to a multitude of rights (e.g. original works or personal data posted on social media, or collected through captcha, subject to copyright, database and data protection) and voluntary shared through non standardized, non interoperable contractual terms. This leads to fragmented legal regimes and has become an even major challenge in the AI-era, for example when online platforms set their own Terms of Services (ToS), in business-to-consumer relationship (B2C).

This article proposes standard terms that may apply to all kind of data (including personal and mixed datasets subject to different legal regimes) based on the open data philosophy initially developed for Free and Open Source software and Creative Commons licenses for artistic and other copyrighted works. In a first part, we analyse how to extend open standard terms to all kinds of data (II). In a second part, we suggest to combine these open standard terms with collective governance instruments, in particular data trust, inspired by commons-based projects and by the centennial collective management of copyright (III). In a last part, after few concluding remarks (IV), we propose a template “Open Data Commons Licenses“ (ODCL) combining compulsory and optional elements to be selected by licensors, illustrated by pictograms and icons inspired by the bricks of Creative Commons licences and legal design techniques (V).

This proposal addresses the bargaining power imbalance and information asymmetry (by offering the licensor the ability to decide the terms), and conceptualises contract law differently. It reverses the current logic of contract: instead of letting companies (licensees) impose their own ToS to the users (licensors, being the copyright owner, data subject, data producer), licensors will reclaim the ability to set their own terms for access and use of data, by selecting standard terms. This should also allow the management of complex datasets, increase data sharing, and improve trust and control over the data. Like previous open licencing standards, the model is expected to lower the transaction costs by reducing the need to develop and read new complicated contractual terms. It can also spread the virality of open data to all data in an AI-era, if any input data under such terms used for AI training purposes propagates its conditions to all aggregated and output data. In other words, any data distributed under our ODCL template will turn all outcome into more or less open data and foster a data common ecosystem. Finally, instead of full openness, our model allows for restrictions outside of certain boundaries (e.g. authorized users and uses), in order to protect the commons and certain values. The model would require to be governed and monitored by a collective data trust…(More)”.