Five-year campaign breaks science’s citation paywall


Article by Dalmeet Singh Chawla: “The more than 60 million scientific-journal papers indexed by Crossref — the database that registers DOIs, or digital object identifiers, for many of the world’s academic publications — now contain reference lists that are free to access and reuse.

The milestone, announced on Twitter on 18 August, is the result of an effort by the Initiative for Open Citations (I4OC), launched in 2017. Open-science advocates have for years campaigned to make papers’ citation data accessible under liberal copyright licences so that they can be studied, and those analyses shared. Free access to citations enables researchers to identify research trends, lets them conduct studies on which areas of research need funding, and helps them to spot when scientists are manipulating citation counts….

The move means that bibliometricians, scientometricians and information scientists will be able to reuse citation data in any way they please under the most liberal copyright licence, called CC0. This, in turn, allows other researchers to build on their work.

Before I4OC, researchers generally had to obtain permission to access data from major scholarly databases such as Web of Science and Scopus, and weren’t able to share the samples.

However, the opening up of Crossref articles’ citations doesn’t mean that all the world’s scholarly content now has open references. Although most major international academic publishers, including Elsevier, Springer Nature (which publishes Nature) and Taylor & Francis, index their papers on Crossref, some do not. These often include regional and non-English-language publications.

I4OC co-founder Dario Taraborelli, who is science programme officer at the Chan Zuckerberg Initiative and based in San Francisco, California, says that the next challenge will be to encourage publishers who don’t already deposit reference data in Crossref to do so….(More)”.

Unlocking the Potential of Open 990 Data


Article by Cinthia Schuman Ottinger & Jeff Williams: “As the movement to expand public use of nonprofit data collected by the Internal Revenue Service advances, it’s a good time to review how far the social sector has come and how much work remains to reach the full potential of this treasure trove…Organizations have employed open Form 990 data in numerous ways, including to:

  • Create new tools for donors.For instance, the Nonprofit Aid Visualizer, a partnership between Candid and Vanguard Charitable, uses open 990 data to find communities vulnerable to COVID-19, and help address both their immediate needs and long-term recovery. Another tool, COVID-19 Urgent Service Provider Support Tool, developed by the consulting firm BCT Partners, uses 990 data to direct donors to service providers that are close to communities most affected by COVID-19.
  • More efficiently prosecute charitable fraud. This includes a campaign by the New York Attorney General’s Office that recovered $1.7 million from sham charities and redirected funds to legitimate groups.
  • Generate groundbreaking findings on fundraising, volunteers, equity, and management. researcher at Texas Tech University, for example, explored more than a million e-filed 990s to overturn long-held assumptions about the role of cash in fundraising. He found that when nonprofits encourage noncash gifts as opposed to only cash contributions, financial contributions to those organizations increase over time.
  • Shed light on harmful practices that hurt the poor. A large-scale investigative analysis of nonprofit hospitals’ tax forms revealed that 45 percent of them sent a total of $2.7 billion in medical bills to patients whose incomes were likely low enough to qualify for free or discounted care. When this practice was publicly exposed, some hospitals reevaluated their practices and erased unpaid bills for qualifying patients. The expense of mining data like this previously made such research next to impossible.
  • Help donors make more informed giving decisions. In hopes of maximizing contributions to Ukrainian relief efforts, a record number of donors are turning to resources like Charity Navigator, which can now use open Form 990 data to evaluate and rate a large number of charities based on finances, governance, and other factors. At the same time, donors informed by open 990 data can seek more accountability from the organizations they support. For example, anti-corruption researchers scouring open 990 data and other records uncovered donations by Russian oligarchs aligned with President Putin. This pressured US nonprofits that accepted money from the oligarchs to disavow this funding…(More)”.

The wealth of (Open Data) nations? Open government data, country-level institutions and entrepreneurial activity


Paper by Franz Huber, Alan Ponce, Francesco Rentocchini & Thomas Wainwright: “Lately, Open Data (OD) has been promoted by governments around the world as a resource to accelerate innovation within entrepreneurial ventures . However,it remains unclear to what extent OD drives innovative entrepreneurship. This paper sheds light on this open question by providing novel empirical evidence on the relationship between OD publishing and (digital) entrepreneurship at the country-level. We draw upon a longitudinal dataset comprising 90 countries observed over the period 2013–2016. We find a significant and positive association between OD publishing and entrepreneurship at the country level. The results also show that OD publishing and entrepreneurship is strong in countries with high institutional quality. We argue that publishing OD is not sufficient to improve innovative entrepreneurship alone, so states need to move beyond a focus on OD initiatives and promotion, to focus on a broader set of policy initiatives that promote good governance…(More)”.

A journey toward an open data culture through transformation of shared data into a data resource


Paper by Scott D. Kahn and Anne Koralova: “The transition to open data practices is straightforward albeit surprisingly challenging to implement largely due to cultural and policy issues. A general data sharing framework is presented along with two case studies that highlight these challenges and offer practical solutions that can be adjusted depending on the type of data collected, the country in which the study is initiated, and the prevailing research culture. Embracing the constraints imposed by data privacy considerations, especially for biomedical data, must be emphasized for data outside of the United States until data privacy law(s) are established at the Federal and/or State level…(More).”

Without appropriate metadata, data-sharing mandates are pointless


Article by Mark A. Musen: “Last month, the US government announced that research articles and most underlying data generated with federal funds should be made publicly available without cost, a policy to be implemented by the end of 2025. That’s atop other important moves. The European Union’s programme for science funding, Horizon Europe, already mandates that almost all data be FAIR (that is, findable, accessible, interoperable and reusable). The motivation behind such data-sharing policies is to make data more accessible so others can use them to both verify results and conduct further analyses.

But just getting those data sets online will not bring anticipated benefits: few data sets will really be FAIR, because most will be unfindable. What’s needed are policies and infrastructure to organize metadata.

Imagine having to search for publications on some topic — say, methods for carbon reclamation — but you could use only the article titles (no keywords, abstracts or search terms). That’s essentially the situation for finding data sets. If I wanted to identify all the deposited data related to carbon reclamation, the task would be futile. Current metadata often contain only administrative and organizational information, such as the name of the investigator and the date when the data were acquired.

What’s more, for scientific data to be useful to other researchers, metadata must sensibly and consistently communicate essentials of the experiments — what was measured, and under what conditions. As an investigator who builds technology to assist with data annotation, it’s frustrating that, in the majority of fields, the metadata standards needed to make data FAIR don’t even exist.

Metadata about data sets typically lack experiment-specific descriptors. If present, they’re sparse and idiosyncratic. An investigator searching the Gene Expression Omnibus (GEO), for example, might seek genomic data sets containing information on how a disease or condition manifests itself in young animals or humans. Performing such a search requires knowledge of how the age of individuals is represented — which in the GEO repository, could be age, AGE, age (after birth), age (years), Age (yr-old) or dozens of other possibilities. (Often, such information is missing from data sets altogether.) Because the metadata are so ad hoc, automated searches fail, and investigators waste enormous amounts of time manually sifting through records to locate relevant data sets, with no guarantee that most (or any) can be found…(More)”.

A User’s Guide to the Periodic Table of Open Data


Guide by Stefaan Verhulst and Andrew Zahuranec: “Leveraging our research on the variables that determine Open Data’s Impact, The Open Data Policy Lab is pleased to announce the publication of a new report designed to assist organizations in implementing the elements of a successful data collaborative: A User’s Guide to The Periodic Table of Open Data.

The User’s Guide is a fillable document designed to empower data stewards and others seeking to improve data access. It can be used as a checklist and tool to weigh different elements based on their context and priorities. By completing the forms (offline/online), you will be able to take a more comprehensive and strategic view of what resources and interventions may be required.

Download and fill out the User’s Guide to operationalize the elements in your data initiative

In conjunction with the release of our User’s Guide, the Open Data Policy Lab is pleased to present a completely reworked version of our Periodic Table of Open Data Elements, first launched alongside in 2016. We sought to categorize the elements that matter in open data initiatives into five categories: problem and demand definition, capacity and culture, governance and standards, personnel and partnerships, and risk mitigation. More information on each can be found in the attached report or in the interactive table below.

Read more about the Periodic Table of Open Data Elements and how you can use it to support your work…(More)”.

Closing the Data Divide for a More Equitable U.S. Digital Economy


Report by Gillian Diebold: “In the United States, access to many public and private services, including those in the financial, educational, and health-care sectors, are intricately linked to data. But adequate data is not collected equitably from all Americans, creating a new challenge: the data divide, in which not everyone has enough high-quality data collected about them or their communities and therefore cannot benefit from data-driven innovation. This report provides an overview of the data divide in the United States and offers recommendations for how policymakers can address these inequalities…(More)”.

Making Government Data Publicly Available: Guidance for Agencies on Releasing Data Responsibly


Report by Hugh Grant-Chapman, and Hannah Quay-de la Vallee: “Government agencies rely on a wide range of data to effectively deliver services to the populations with which they engage. Civic-minded advocates frequently argue that the public benefits of this data can be better harnessed by making it available for public access. Recent years, however, have also seen growing recognition that the public release of government data can carry certain risks. Government agencies hoping to release data publicly should consider those potential risks in deciding which data to make publicly available and how to go about releasing it.

This guidance offers an introduction to making data publicly available while addressing privacy and ethical data use issues. It is intended for administrators at government agencies that deliver services to individuals — especially those at the state and local levels — who are interested in publicly releasing government data. This guidance focuses on challenges that may arise when releasing aggregated data derived from sensitive information, particularly individual-level data.

The report begins by highlighting key benefits and risks of making government data publicly available. Benefits include empowering members of the general public, supporting research on program efficacy, supporting the work of organizations providing adjacent services, reducing agencies’ administrative burden, and holding government agencies accountable. Potential risks include breaches of individual privacy; irresponsible uses of the data by third parties; and the possibility that the data is not used at all, resulting in wasted resources.

In light of these benefits and risks, the report presents four recommended actions for publishing government data responsibly:

  1. Establish data governance processes and roles;
  2. Engage external communities;
  3. Ensure responsible use and privacy protection; and
  4. Evaluate resource constraints.

These key considerations also take into account federal and state laws as well as emerging computational and analytical techniques for protecting privacy when releasing data, such as differential privacy techniques and synthetic data. Each of these techniques involves unique benefits and trade-offs to be considered in context of the goals of a given data release…(More)”.

OSTP Issues Guidance to Make Federally Funded Research Freely Available Without Delay


The White House: “Today, the White House Office of Science and Technology Policy (OSTP) updated U.S. policy guidance to make the results of taxpayer-supported research immediately available to the American public at no cost. In a memorandum to federal departments and agencies, Dr. Alondra Nelson, the head of OSTP, delivered guidance for agencies to update their public access policies as soon as possible to make publications and research funded by taxpayers publicly accessible, without an embargo or cost. All agencies will fully implement updated policies, including ending the optional 12-month embargo, no later than December 31, 2025.

This policy will likely yield significant benefits on a number of key priorities for the American people, from environmental justice to cancer breakthroughs, and from game-changing clean energy technologies to protecting civil liberties in an automated world.

For years, President Biden has been committed to delivering policy based on the best available science, and to working to ensure the American people have access to the findings of that research. “Right now, you work for years to come up with a significant breakthrough, and if you do, you get to publish a paper in one of the top journals,” said then-Vice President Biden in remarks to the American Association for Cancer Research in 2016. “For anyone to get access to that publication, they have to pay hundreds, or even thousands, of dollars to subscribe to a single journal. And here’s the kicker — the journal owns the data for a year. The taxpayers fund $5 billion a year in cancer research every year, but once it’s published, nearly all of that taxpayer-funded research sits behind walls. Tell me how this is moving the process along more rapidly.” The new public access guidance was developed with the input of multiple federal agencies over the course of this year, to enable progress on a number of Biden-Harris Administration priorities.

“When research is widely available to other researchers and the public, it can save lives, provide policymakers with the tools to make critical decisions, and drive more equitable outcomes across every sector of society,” said Dr. Alondra Nelson, head of OSTP. “The American people fund tens of billions of dollars of cutting-edge research annually. There should be no delay or barrier between the American public and the returns on their investments in research.”..(More)“.

Big, Open Data for Development: A Vision for India 


Paper by Sam Asher, Aditi Bhowmick, Alison Campion, Tobias Lunt and Paul Novosad: “The government generates terabytes of data directly and incidentally in the operation of public programs. For intrinsic and instrumental reasons, these data should be made open to the public. Intrinsically, a right to government data is implicit in the right to information. Instrumentally, open government data will improve policy, increase accountability, empower citizens, create new opportunities for private firms, and lead to development and economic growth. A series of case studies demonstrates these benefits in a range of other contexts. We next examine how government can maximize social benefit from government data. This entails opening administrative data as far upstream in the data pipeline as possible. Most administrative data can be minimally aggregated to protect privacy, while providing data with high geographic granularity. We assess the status quo of the Government of India’s data production and dissemination pipeline, and find that the greatest weakness lies in the last mile: making government data accessible to the public. This means more than posting it online; we describe a set of principles for lowering the access and use costs close to zero. Finally, we examine the use of government data to guide policy in the COVID-19 pandemic. Civil society played a key role in aggregating, disseminating, and analyzing government data, providing analysis that was essential to policy response. However, key pieces of data, like testing rates and seroprevalence distribution, were unnecessarily withheld by the government, data which could have substantially improved the policy response. A more open approach to government data would have saved many lives…(More)”.