How Copyright May Destroy Our Access To The World’s Academic Knowledge


Article by Glyn Moody: “The shift from analogue to digital has had a massive impact on most aspects of life. One area where that shift has the potential for huge benefits is in the world of academic publishing. Academic papers are costly to publish and distribute on paper, but in a digital format they can be shared globally for almost no cost. That’s one of the driving forces behind the open access movement. But as Walled Culture has reported, resistance from the traditional publishing world has slowed the shift to open access, and undercut the benefits that could flow from it.

That in itself is bad news, but new research from Martin Paul Eve (available as open access) shows that the way the shift to digital has been managed by publishers brings with it a new problem. For all their flaws, analogue publications have the great virtue that they are durable: once a library has a copy, it is likely to be available for decades, if not centuries. Digital scholarly articles come with no such guarantee. The Internet is constantly in flux, with many publishers and sites closing down each year, often without notice. That’s a problem when sites holding archival copies of scholarly articles vanish, making it harder, perhaps impossible, to access important papers. Eve explored whether publishers were placing copies of the articles they published in key archives. Ideally, digital papers would be available in multiple archives to ensure resilience, but the reality is that very few publishers did this. Ars Technica has a good summary of Eve’s results:

When Eve broke down the results by publisher, less than 1 percent of the 204 publishers had put the majority of their content into multiple archives. (The cutoff was 75 percent of their content in three or more archives.) Fewer than 10 percent had put more than half their content in at least two archives. And a full third seemed to be doing no organized archiving at all.

At the individual publication level, under 60 percent were present in at least one archive, and over a quarter didn’t appear to be in any of the archives at all. (Another 14 percent were published too recently to have been archived or had incomplete records.)..(More)”.

Meta Kills a Crucial Transparency Tool At the Worst Possible Time


Interview by Vittoria Elliott: “Earlier this month, Meta announced that it would be shutting down CrowdTangle, the social media monitoring and transparency tool that has allowed journalists and researchers to track the spread of mis- and disinformation. It will cease to function on August 14, 2024—just months before the US presidential election.

Meta’s move is just the latest example of a tech company rolling back transparency and security measures as the world enters the biggest global election year in history. The company says it is replacing CrowdTangle with a new Content Library API, which will require researchers and nonprofits to apply for access to the company’s data. But the Mozilla Foundation and 140 other civil society organizations protested last week that the new offering lacks much of CrowdTangle’s functionality, asking the company to keep the original tool operating until January 2025.

Meta spokesperson Andy Stone countered in posts on X that the groups’ claims “are just wrong,” saying the new Content Library will contain “more comprehensive data than CrowdTangle” and be made available to nonprofits, academics, and election integrity experts. When asked why commercial newsrooms, like WIRED, are to be excluded from the Content Library, Meta spokesperson Eric Porterfield said,  that it was “built for research purposes.” While journalists might not have direct access he suggested they could use commercial social network analysis tools, or “partner with an academic institution to help answer a research question related to our platforms.”

Brandon Silverman, cofounder and former CEO of CrowdTangle, who continued to work on the tool after Facebook acquired it in 2016, says it’s time to force platforms to open up their data to outsiders. The conversation has been edited for length and clarity…(More)”.

Meta to shut off data access to journalists


Article by Sara Fischer: “Meta plans to officially shutter CrowdTangle, the analytics tool widely used by journalists and researchers to see what’s going viral on Facebook and Instagram, the company’s president of global affairs Nick Clegg told Axios in an interview.

Why it matters: The company plans to instead offer select researchers access to a set of new data tools, but news publishers, journalists or anyone with commercial interests will not be granted access to that data.

The big picture: The effort comes amid a broader pivot from Meta away from news and politics and more toward user-generated viral videos.

  • Meta acquired CrowdTangle in 2016 at a time when publishers were heavily reliant on the tech giant for traffic.
  • In recent years, it’s stopped investing in the tool, making it less reliable.

The new research tools include Meta’s Content Library, which it launched last year, and an API, or backend interface used by developers.

  • Both tools offer researchers access to huge swaths of data from publicly accessible content across Facebook and Instagram.
  • The tools are available in 180 languages and offer global data.
  • Researchers must apply for access to those tools through the Inter-university Consortium for Political and Social Research at the University of Michigan, which will vet their requests…(More)”

New Horizons


An Introduction to the 2nd Edition of the State of Open Data by Renata Avila and Tim Davies: “The struggle to deliver on the vision that data, this critical resource of modern societies, should be widely available, well structured, and shared for all to use, has been a long one. It has been a struggle involving thousands upon thousands of individuals, organisations, and communities. Without their efforts, public procurement would be opaque, smart-cities even more corporate controlled, transport systems less integrated, and pandemic responses less rapid. Across numerous initiatives, open data has become more embedded as a way to support accountability, enable collaboration, and to better unlock the value of data. 

However, much like the climber reaching the top of the foothills, and for the first time seeing the hard climb of the whole mountain coming into view, open data advocates, architects, and community builders have not reached the end of their journey. As we move into the middle of the 2020s, action on open data faces new and significant challenges if we are to see a future in which open and enabling data infrastructures and ecosystems are the norm rather than a sparse patchwork of exceptions. Building open infrastructures to power social change for the next century is no small task, and to meet the challenges ahead, we will need all that the lessons we can gather from more than 15 years of open data action to date…Across the collection, we can find two main pathways to broader participation explored. On the one hand are discussions of widening public engagement and data literacy, creating a more diverse constituency of people interested and able to engage with data projects in a voluntary capacity. On the other, are calls for more formalisation of data governance, embedding citizen voices within increasingly structured data collaborations and ensuring that affected stakeholders are consulted on, or given a role in, key data decisions. Mariel García-Montes (Data Literacy) underscores the case for an equity-first approach to the first pathway, highlighting how generalist data literacy can be used for or against the public good, and calling for approaches to data literacy building that centre on an understanding of inequality and power. In writing on urban development, Stefaan G. Verhulst and Sampriti Saxena (Urban Development) point to a number of examples of the latter approach in which cities are experimenting with various forms of deliberative conversations and processes…(More)”.

A Plan to Develop Open Science’s Green Shoots into a Thriving Garden


Article by Greg Tananbaum, Chelle Gentemann, Kamran Naim, and Christopher Steven Marcum: “…As it’s moved from an abstract set of principles about access to research and data into the realm of real-world activities, the open science movement has mirrored some of the characteristics of the open source movement: distributed, independent, with loosely coordinated actions happening in different places at different levels. Globally, many things are happening, often disconnected, but still interrelated: open science has sowed a constellation of thriving green shoots, not quite yet a garden, but all growing rapidly on arable soil.

Streamlining research processes, reducing duplication of efforts, and accelerating scientific discoveries could ensure that the fruits of open science processes and products are more accessible and equitably distributed.

It is now time to consider how much faster and farther the open science movement could go with more coordination. What efficiencies might be realized if disparate efforts could better harmonize across geographies, disciplines, and sectors? How would an intentional, systems-level approach to aligning incentives, infrastructure, training, and other key components of a rationally functioning research ecosystem advance the wider goals of the movement? Streamlining research processes, reducing duplication of efforts, and accelerating scientific discoveries could ensure that the fruits of open science processes and products are more accessible and equitably distributed…(More)”

Outpacing Pandemics: Solving the First and Last Mile Challenges of Data-Driven Policy Making


Article by Stefaan Verhulst, Daniela Paolotti, Ciro Cattuto, and Alessandro Vespignani: “As society continues to emerge from the legacy of COVID-19, a dangerous complacency seems to be setting in. Amidst recurrent surges of cases, each serving as a reminder of the virus’s persistence, there is a noticeable decline in collective urgency to prepare for future pandemics. This situation represents not just a lapse in memory but a significant shortfall in our approach to pandemic preparedness. It dramatically underscores the urgent need to develop novel and sustainable approaches and responses and to reinvent how we approach public health emergencies.

Among the many lessons learned from previous infectious disease outbreaks, the potential and utility of data, and particularly non-traditional forms of data, are surely among the most important lessons. Among other benefits, data has proven useful in providing intelligence and situational awareness in early stages of outbreaks, empowering citizens to protect their health and the health of vulnerable community members, advancing compliance with non-pharmaceutical interventions to mitigate societal impacts, tracking vaccination rates and the availability of treatment, and more. A variety of research now highlights the particular role played by open source data (and other non-traditional forms of data) in these initiatives.

Although multiple data sources are useful at various stages of outbreaks, we focus on two critical stages proven to be especially challenging: what we call the first mile and the last mile.

We argue that focusing on these two stages (or chokepoints) can help pandemic responses and rationalize resources. In particular, we highlight the role of Data Stewards at both stages and in overall pandemic response effectiveness…(More)”.

Future-Proofing Transparency: Re-Thinking Public Record Governance For the Age of Big Data


Paper by Beatriz Botero Arcila: “Public records, public deeds, and even open data portals often include personal information that can now be easily accessed online. Yet, for all the recent attention given to informational privacy and data protection, scant literature exists on the governance of personal information that is available in public documents. This Article examines the critical issue of balancing privacy and transparency within public record governance in the age of Big Data.

With Big Data and powerful machine learning algorithms, personal information in public records can easily be used to infer sensitive data about people or aggregated to create a comprehensive personal profile of almost anyone. This information is public and open, however, for many good reasons: ensuring political accountability, facilitating democratic participation, enabling economic transactions, combating illegal activities such as money laundering and terrorism financing, and facilitating. Can the interest in record publicity coexist with the growing ease of deanonymizing and revealing sensitive information about individuals?

This Article addresses this question from a comparative perspective, focusing on US and EU access to information law. The Article shows that the publicity of records was, in the past and not withstanding its presumptive public nature, protected because most people would not trouble themselves to go to public offices to review them, and it was practical impossible to aggregate them to draw extensive profiles about people. Drawing from this insight and contemporary debates on data governance, this Article challenges the binary classification of data as either published or not and proposes a risk-based framework that re-insert that natural friction to public record governance by leveraging techno-legal methods in how information is published and accessed…(More)”.

Do disappearing data repositories pose a threat to open science and the scholarly record?


Article by Dorothea Strecker, Heinz Pampel, Rouven Schabinger and Nina Leonie Weisweiler: “Research data repositories, such as Zenodo or the UK Data Archive, are specialised information infrastructures that focus on the curation and dissemination of research data. One of repositories’ main tasks is maintaining their collections long-term, see for example the TRUST Principles, or the requirements of the certification organization CoreTrustSeal. Long-term preservation is also a prerequisite for several data practices that are getting increasing attention, such as data reuse and data citation.

For data to remain usable, the infrastructures that host them also have to be kept operational. However, the long-term operation of research data repositories is challenging, and sometimes, for varying reasons and despite best efforts, they are shut down….

In a recent study we therefore set out to take an infrastructure perspective on the long-term preservation of research data by investigating repositories across disciplines and types that were shut down. We also tried to estimate the impact of repository shutdown on data availability…

We found that repository shutdown was not rare: 6.2% of all repositories listed in re3data were shut down. Since the launch of the registry in 2012, at least one repository has been shut down each year (see Fig.1). The median age of a repository when shutting down was 12 years…(More)”.

People Have a Right to Climate Data


Article by Justin S. Mankin: “As a climate scientist documenting the multi-trillion-dollar price tag of the climate disasters shocking economies and destroying lives, I sometimes field requests from strategic consultantsfinancial investment analysts and reinsurers looking for climate data, analysis and computer code.

Often, they want to chat about my findings or have me draw out the implications for their businesses, like the time a risk analyst from BlackRock, the world’s largest asset manager, asked me to help with research on what the current El Niño, a cyclical climate pattern, means for financial markets.

These requests make sense: People and companies want to adapt to the climate risks they face from global warming. But these inquiries are also part of the wider commodification of climate science. Venture capitalists are injecting hundreds of millions of dollars into climate intelligence as they build out a rapidly growing business of climate analytics — the data, risk models, tailored analyses and insights people and institutions need to understand and respond to climate risks.

I point companies to our freely available data and code at the Dartmouth Climate Modeling and Impacts Group, which I run, but turn down additional requests for customized assessments. I regard climate information as a public good and fear contributing to a world in which information about the unfolding risks of droughts, floods, wildfires, extreme heat and rising seas are hidden behind paywalls. People and companies who can afford private risk assessments will rent, buy and establish homes and businesses in safer places than the billions of others who can’t, compounding disadvantage and leaving the most vulnerable among us exposed.

Despite this, global consultants, climate and agricultural technology start-ups, insurance companies and major financial firms are all racing to meet the ballooning demand for information about climate dangers and how to prepare for them. While a lot of this information is public, it is often voluminous, technical and not particularly useful for people trying to evaluate their personal exposure. Private risk assessments fill that gap — but at a premium. The climate risk analytics market is expected to grow to more than $4 billion globally by 2027.

I don’t mean to suggest that the private sector should not be involved in furnishing climate information. That’s not realistic. But I worry that an overreliance on the private sector to provide climate adaptation information will hollow out publicly provided climate risk science, and that means we all will pay: the well-off with money, the poor with lives…(More)”.

The New Digital Dark Age


Article by Gina Neff: “For researchers, social media has always represented greater access to data, more democratic involvement in knowledge production, and great transparency about social behavior. Getting a sense of what was happening—especially during political crises, major media events, or natural disasters—was as easy as looking around a platform like Twitter or Facebook. In 2024, however, that will no longer be possible.

In 2024, we will face a grim digital dark age, as social media platforms transition away from the logic of Web 2.0 and toward one dictated by AI-generated content. Companies have rushed to incorporate large language models (LLMs) into online services, complete with hallucinations (inaccurate, unjustified responses) and mistakes, which have further fractured our trust in online information.

Another aspect of this new digital dark age comes from not being able to see what others are doing. Twitter once pulsed with publicly readable sentiment of its users. Social researchers loved Twitter data, relying on it because it provided a ready, reasonable approximation of how a significant slice of internet users behaved. However, Elon Musk has now priced researchers out of Twitter data after recently announcing that it was ending free access to the platform’s API. This made it difficult, if not impossible, to obtain data needed for research on topics such as public health, natural disaster response, political campaigning, and economic activity. It was a harsh reminder that the modern internet has never been free or democratic, but instead walled and controlled.

Closer cooperation with platform companies is not the answer. X, for instance, has filed a suit against independent researchers who pointed out the rise in hate speech on the platform. Recently, it has also been revealed that researchers who used Facebook and Instagram’s data to study the platforms’ role in the US 2020 elections had been granted “independence by permission” by Meta. This means that the company chooses which projects to share its data with and, while the research may be independent, Meta also controls what types of questions are asked and who asks them…(More)”.