Future-Proofing Transparency: Re-Thinking Public Record Governance For the Age of Big Data


Paper by Beatriz Botero Arcila: “Public records, public deeds, and even open data portals often include personal information that can now be easily accessed online. Yet, for all the recent attention given to informational privacy and data protection, scant literature exists on the governance of personal information that is available in public documents. This Article examines the critical issue of balancing privacy and transparency within public record governance in the age of Big Data.

With Big Data and powerful machine learning algorithms, personal information in public records can easily be used to infer sensitive data about people or aggregated to create a comprehensive personal profile of almost anyone. This information is public and open, however, for many good reasons: ensuring political accountability, facilitating democratic participation, enabling economic transactions, combating illegal activities such as money laundering and terrorism financing, and facilitating. Can the interest in record publicity coexist with the growing ease of deanonymizing and revealing sensitive information about individuals?

This Article addresses this question from a comparative perspective, focusing on US and EU access to information law. The Article shows that the publicity of records was, in the past and not withstanding its presumptive public nature, protected because most people would not trouble themselves to go to public offices to review them, and it was practical impossible to aggregate them to draw extensive profiles about people. Drawing from this insight and contemporary debates on data governance, this Article challenges the binary classification of data as either published or not and proposes a risk-based framework that re-insert that natural friction to public record governance by leveraging techno-legal methods in how information is published and accessed…(More)”.

Do disappearing data repositories pose a threat to open science and the scholarly record?


Article by Dorothea Strecker, Heinz Pampel, Rouven Schabinger and Nina Leonie Weisweiler: “Research data repositories, such as Zenodo or the UK Data Archive, are specialised information infrastructures that focus on the curation and dissemination of research data. One of repositories’ main tasks is maintaining their collections long-term, see for example the TRUST Principles, or the requirements of the certification organization CoreTrustSeal. Long-term preservation is also a prerequisite for several data practices that are getting increasing attention, such as data reuse and data citation.

For data to remain usable, the infrastructures that host them also have to be kept operational. However, the long-term operation of research data repositories is challenging, and sometimes, for varying reasons and despite best efforts, they are shut down….

In a recent study we therefore set out to take an infrastructure perspective on the long-term preservation of research data by investigating repositories across disciplines and types that were shut down. We also tried to estimate the impact of repository shutdown on data availability…

We found that repository shutdown was not rare: 6.2% of all repositories listed in re3data were shut down. Since the launch of the registry in 2012, at least one repository has been shut down each year (see Fig.1). The median age of a repository when shutting down was 12 years…(More)”.

People Have a Right to Climate Data


Article by Justin S. Mankin: “As a climate scientist documenting the multi-trillion-dollar price tag of the climate disasters shocking economies and destroying lives, I sometimes field requests from strategic consultantsfinancial investment analysts and reinsurers looking for climate data, analysis and computer code.

Often, they want to chat about my findings or have me draw out the implications for their businesses, like the time a risk analyst from BlackRock, the world’s largest asset manager, asked me to help with research on what the current El Niño, a cyclical climate pattern, means for financial markets.

These requests make sense: People and companies want to adapt to the climate risks they face from global warming. But these inquiries are also part of the wider commodification of climate science. Venture capitalists are injecting hundreds of millions of dollars into climate intelligence as they build out a rapidly growing business of climate analytics — the data, risk models, tailored analyses and insights people and institutions need to understand and respond to climate risks.

I point companies to our freely available data and code at the Dartmouth Climate Modeling and Impacts Group, which I run, but turn down additional requests for customized assessments. I regard climate information as a public good and fear contributing to a world in which information about the unfolding risks of droughts, floods, wildfires, extreme heat and rising seas are hidden behind paywalls. People and companies who can afford private risk assessments will rent, buy and establish homes and businesses in safer places than the billions of others who can’t, compounding disadvantage and leaving the most vulnerable among us exposed.

Despite this, global consultants, climate and agricultural technology start-ups, insurance companies and major financial firms are all racing to meet the ballooning demand for information about climate dangers and how to prepare for them. While a lot of this information is public, it is often voluminous, technical and not particularly useful for people trying to evaluate their personal exposure. Private risk assessments fill that gap — but at a premium. The climate risk analytics market is expected to grow to more than $4 billion globally by 2027.

I don’t mean to suggest that the private sector should not be involved in furnishing climate information. That’s not realistic. But I worry that an overreliance on the private sector to provide climate adaptation information will hollow out publicly provided climate risk science, and that means we all will pay: the well-off with money, the poor with lives…(More)”.

The New Digital Dark Age


Article by Gina Neff: “For researchers, social media has always represented greater access to data, more democratic involvement in knowledge production, and great transparency about social behavior. Getting a sense of what was happening—especially during political crises, major media events, or natural disasters—was as easy as looking around a platform like Twitter or Facebook. In 2024, however, that will no longer be possible.

In 2024, we will face a grim digital dark age, as social media platforms transition away from the logic of Web 2.0 and toward one dictated by AI-generated content. Companies have rushed to incorporate large language models (LLMs) into online services, complete with hallucinations (inaccurate, unjustified responses) and mistakes, which have further fractured our trust in online information.

Another aspect of this new digital dark age comes from not being able to see what others are doing. Twitter once pulsed with publicly readable sentiment of its users. Social researchers loved Twitter data, relying on it because it provided a ready, reasonable approximation of how a significant slice of internet users behaved. However, Elon Musk has now priced researchers out of Twitter data after recently announcing that it was ending free access to the platform’s API. This made it difficult, if not impossible, to obtain data needed for research on topics such as public health, natural disaster response, political campaigning, and economic activity. It was a harsh reminder that the modern internet has never been free or democratic, but instead walled and controlled.

Closer cooperation with platform companies is not the answer. X, for instance, has filed a suit against independent researchers who pointed out the rise in hate speech on the platform. Recently, it has also been revealed that researchers who used Facebook and Instagram’s data to study the platforms’ role in the US 2020 elections had been granted “independence by permission” by Meta. This means that the company chooses which projects to share its data with and, while the research may be independent, Meta also controls what types of questions are asked and who asks them…(More)”.

Toward a Solid Acceptance of the Decentralized Web of Personal Data: Societal and Technological Convergence


Article by Ana Pop Stefanija et al: “Citizens using common online services such as social media, health tracking, or online shopping effectively hand over control of their personal data to the service providers—often large corporations. The services using and processing personal data are also holding the data. This situation is problematic, as has been recognized for some time: competition and innovation are stifled; data is duplicated; and citizens are in a weak position to enforce legal rights such as access, rectification, or erasure. The approach to address this problem has been to ascertain that citizens can access and update, with every possible service provider, the personal data that providers hold of or about them—the foundational view taken in the European General Data Protection Regulation (GDPR).

Recently, however, various societal, technological, and regulatory efforts are taking a very different approach, turning things around. The central tenet of this complementary view is that citizens should regain control of their personal data. Once in control, citizens can decide which providers they want to share data with, and if so, exactly which part of their data. Moreover, they can revisit these decisions anytime…(More)”.

Where Did the Open Access Movement Go Wrong?


An Interview with Richard Poynder by Richard Anderson: “…Open access was intended to solve three problems that have long blighted scholarly communication – the problems of accessibilityaffordability, and equity. 20+ years after the Budapest Open Access Initiative (BOAI) we can see that the movement has signally failed to solve the latter two problems. And with the geopolitical situation deteriorating solving the accessibility problem now also looks to be at risk. The OA dream of “universal open access” remains a dream and seems likely to remain one.

What has been the essence of the OA movement’s failure?

The fundamental problem was that OA advocates did not take ownership of their own movement. They failed, for instance, to establish a central organization (an OA foundation, if you like) in order to organize and better manage the movement; and they failed to publish a single, canonical definition of open access. This is in contrast to the open source movement, and is an omission I drew attention to in 2006

This failure to take ownership saw responsibility for OA pass to organizations whose interests are not necessarily in sync with the objectives of the movement.

It did not help that the BOAI definition failed to specify that to be classified as open access, scholarly works needed to be made freely available immediately on publication and that they should remain freely available in perpetuity. Nor did it give sufficient thought to how OA would be funded (and OA advocates still fail to do that).

This allowed publishers to co-opt OA for their own purposes, most notably by introducing embargoes and developing the pay-to-publish gold OA model, with its now infamous article processing charge (APC).

Pay-to-publish OA is now the dominant form of open access and looks set to increase the cost of scholarly publishing and so worsen the affordability problem. Amongst other things, this has disenfranchised unfunded researchers and those based in the global south (notwithstanding APC waiver promises).

What also did not help is that OA advocates passed responsibility for open access over to universities and funders. This was contradictory, because OA was conceived as something that researchers would opt into. The assumption was that once the benefits of open access were explained to them, researchers would voluntarily embrace it – primarily by self-archiving their research in institutional or preprint repositories. But while many researchers were willing to sign petitions in support of open access, few (outside disciplines like physics) proved willing to practice it voluntarily.

In response to this lack of engagement, OA advocates began to petition universities, funders, and governments to introduce OA policies recommending that researchers make their papers open access. When these policies also failed to have the desired effect, OA advocates demanded their colleagues be forced to make their work OA by means of mandates requiring them to do so.

Most universities and funders (certainly in the global north) responded positively to these calls, in the belief that open access would increase the pace of scientific development and allow them to present themselves as forward-thinking, future-embracing organizations. Essentially, they saw it as a way of improving productivity and ROI while enhancing their public image.

While many researchers were willing to sign petitions in support of open access, few proved willing to practice it voluntarily.

But in light of researchers’ continued reluctance to make their works open access, universities and funders began to introduce increasingly bureaucratic rules, sanctions, and reporting tools to ensure compliance, and to manage the more complex billing arrangements that OA has introduced.

So, what had been conceived as a bottom-up movement founded on principles of voluntarism morphed into a top-down system of command and control, and open access evolved into an oppressive bureaucratic process that has failed to address either the affordability or equity problems. And as the process, and the rules around that process, have become ever more complex and oppressive, researchers have tended to become alienated from open access.

As a side benefit for universities and funders OA has allowed them to better micromanage their faculty and fundees, and to monitor their publishing activities in ways not previously possible. This has served to further proletarianize researchers and today they are becoming the academic equivalent of workers on an assembly line. Philip Mirowski has predicted that open access will lead to the deskilling of academic labor. The arrival of generative AI might seem to make that outcome the more likely…

Can these failures be remedied by means of an OA reset? With this aim in mind (and aware of the failures of the movement), OA advocates are now devoting much of their energy to trying to persuade universities, funders, and philanthropists to invest in a network of alternative nonprofit open infrastructures. They envisage these being publicly owned and focused on facilitating a flowering of new diamond OA journals, preprint servers, and Publish, Review, Curate (PRC) initiatives. In the process, they expect commercial publishers will be marginalized and eventually dislodged.

But it is highly unlikely that the large sums of money that would be needed to create these alternative infrastructures will be forthcoming, certainly not at sufficient levels or on anything other than a temporary basis.

While it is true that more papers and preprints are being published open access each year, I am not convinced this is taking us down the road to universal open access, or that there is a global commitment to open access.

Consequently, I do not believe that a meaningful reset is possible: open access has reached an impasse and there is no obvious way forward that could see the objectives of the OA movement fulfilled.

Partly for this reason, we are seeing attempts to rebrand, reinterpret, and/or reimagine open access and its objectives…(More)”.

Open Data Commons Licences (ODCL): Licensing personal and non personal data supporting the commons and privacy


Paper by Yaniv Benhamou and Melanie Dulong de Rosnay: “Data are often subject to a multitude of rights (e.g. original works or personal data posted on social media, or collected through captcha, subject to copyright, database and data protection) and voluntary shared through non standardized, non interoperable contractual terms. This leads to fragmented legal regimes and has become an even major challenge in the AI-era, for example when online platforms set their own Terms of Services (ToS), in business-to-consumer relationship (B2C).

This article proposes standard terms that may apply to all kind of data (including personal and mixed datasets subject to different legal regimes) based on the open data philosophy initially developed for Free and Open Source software and Creative Commons licenses for artistic and other copyrighted works. In a first part, we analyse how to extend open standard terms to all kinds of data (II). In a second part, we suggest to combine these open standard terms with collective governance instruments, in particular data trust, inspired by commons-based projects and by the centennial collective management of copyright (III). In a last part, after few concluding remarks (IV), we propose a template “Open Data Commons Licenses“ (ODCL) combining compulsory and optional elements to be selected by licensors, illustrated by pictograms and icons inspired by the bricks of Creative Commons licences and legal design techniques (V).

This proposal addresses the bargaining power imbalance and information asymmetry (by offering the licensor the ability to decide the terms), and conceptualises contract law differently. It reverses the current logic of contract: instead of letting companies (licensees) impose their own ToS to the users (licensors, being the copyright owner, data subject, data producer), licensors will reclaim the ability to set their own terms for access and use of data, by selecting standard terms. This should also allow the management of complex datasets, increase data sharing, and improve trust and control over the data. Like previous open licencing standards, the model is expected to lower the transaction costs by reducing the need to develop and read new complicated contractual terms. It can also spread the virality of open data to all data in an AI-era, if any input data under such terms used for AI training purposes propagates its conditions to all aggregated and output data. In other words, any data distributed under our ODCL template will turn all outcome into more or less open data and foster a data common ecosystem. Finally, instead of full openness, our model allows for restrictions outside of certain boundaries (e.g. authorized users and uses), in order to protect the commons and certain values. The model would require to be governed and monitored by a collective data trust…(More)”.

How to make data open? Stop overlooking librarians


Article by Jessica Farrell: “The ‘Year of Open Science’, as declared by the US Office of Science and Technology Policy (OSTP), is now wrapping up. This followed an August 2022 memo from OSTP acting director Alondra Nelson, which mandated that data and peer-reviewed publications from federally funded research should be made freely accessible by the end of 2025. Federal agencies are required to publish full plans for the switch by the end of 2024.

But the specifics of how data will be preserved and made publicly available are far from being nailed down. I worked in archives for ten years and now facilitate two digital-archiving communities, the Software Preservation Network and BitCurator Consortium, at Educopia in Atlanta, Georgia. The expertise of people such as myself is often overlooked. More open-science projects need to integrate digital archivists and librarians, to capitalize on the tools and approaches that we have already created to make knowledge accessible and open to the public.How to make your scientific data accessible, discoverable and useful

Making data open and ‘FAIR’ — findable, accessible, interoperable and reusable — poses technical, legal, organizational and financial questions. How can organizations best coordinate to ensure universal access to disparate data? Who will do that work? How can we ensure that the data remain open long after grant funding runs dry?

Many archivists agree that technical questions are the most solvable, given enough funding to cover the labour involved. But they are nonetheless complex. Ideally, any open research should be testable for reproducibility, but re-running scripts or procedures might not be possible unless all of the required coding libraries and environments used to analyse the data have also been preserved. Besides the contents of spreadsheets and databases, scientific-research data can include 2D or 3D images, audio, video, websites and other digital media, all in a variety of formats. Some of these might be accessible only with proprietary or outdated software…(More)”.

Open data ecosystems: what models to co-create service innovations in smart cities?


Paper by Arthur Sarazin: “While smart cities are recently providing open data, how to organise the collective creation of data, knowledge and related products and services produced from this collective resource, still remains to be thought. This paper aims at gathering the literature review on open data ecosystems to tackle the following research question: what models can be imagined to stimulate the collective co-creation of services between smart cities’ stakeholders acting as providers and users of open data? Such issue is currently at stake in many municipalities such as Lisbon which decided to position itself as a platform (O’Reilly, 2010) in the local digital ecosystem. With the implementation of its City Operation Center (COI), Lisbon’s municipality provides an Information Infrastructure (Bowker et al., 2009) to many different types of actors such as telecom companies, municipalities, energy utilities or transport companies. Through this infrastructure, Lisbon encourages such actors to gather, integrate and release heterogeneous datasets and tries to orchestrate synergies among them so data-driven solution to urban problems can emerge (Carvalho and Vale, 2018). The remaining question being: what models for the municipalities such as Lisbon to lean on so as to drive this cutting-edge type of service innovation?…(More)”.

The Oligopoly’s Shift to Open Access. How the Big Five Academic Publishers Profit from Article Processing Charges 


Paper by Leigh-Ann Butler et al: “This study aims to estimate the total amount of article processing charges (APCs) paid to publish open access (OA) in journals controlled by the five large commercial publishers Elsevier, Sage, Springer-Nature, Taylor & Francis and Wiley between 2015 and 2018. Using publication data from WoS, OA status from Unpaywall and annual APC prices from open datasets and historical fees retrieved via the Internet Archive Wayback Machine, we estimate that globally authors paid $1.06 billion in publication fees to these publishers from 2015–2018. Revenue from gold OA amounted to $612.5 million, while $448.3 million was obtained for publishing OA in hybrid journals. Among the five publishers, Springer-Nature made the most revenue from OA ($589.7 million), followed by Elsevier ($221.4 million), Wiley ($114.3 million), Taylor & Francis ($76.8 million) and Sage ($31.6 million). With Elsevier and Wiley making most of APC revenue from hybrid fees and others focusing on gold, different OA strategies could be observed between publishers…(More)”.This study aims to estimate the total amount of article processing charges (APCs) paid to publish open access (OA) in journals controlled by the five large commercial publishers Elsevier, Sage, Springer-Nature, Taylor & Francis and Wiley between 2015 and 2018. Using publication data from WoS, OA status from Unpaywall and annual APC prices from open datasets and historical fees retrieved via the Internet Archive Wayback Machine, we estimate that globally authors paid $1.06 billion in publication fees to these publishers from 2015–2018. Revenue from gold OA amounted to $612.5 million, while $448.3 million was obtained for publishing OA in hybrid journals. Among the five publishers, Springer-Nature made the most revenue from OA ($589.7 million), followed by Elsevier ($221.4 million), Wiley ($114.3 million), Taylor & Francis ($76.8 million) and Sage ($31.6 million). With Elsevier and Wiley making most of APC revenue from hybrid fees and others focusing on gold, different OA strategies could be observed between publishers.