The State of Open Data 2021


Report by Digital Science (Australia): “Since 2016, we have monitored levels of data sharing and usage. Over the years, we have had 21,000 responses from researchers worldwide providing unparalleled insight into their motivations, challenges, perceptions, and behaviours toward open data.

In our sixth survey, we asked about motivations as well as perceived discoverability and credibility of data that is shared openly. The State of Open Data is a critical piece of information that enables us to identify the barriers to open data from a researcher perspective, laying the foundation for future action. 

Key findings from this year’s survey

  • 73% support the idea of a national mandate for making research data openly available
  • 52% said funders should make the sharing of research data part of their requirements for awarding grants
  • 47% said they would be motivated to share their data if there was a journal or publisher requirement to do so
  • About a third of respondents indicated that they have reused their own or someone else’s openly accessible data more during the pandemic than before
  • There are growing concerns over misuse and lack of credit for open sharing…(More)”

‘Anyway, the dashboard is dead’: On trying to build urban informatics


Paper by Jathan Sadowski: “How do the idealised promises and purposes of urban informatics compare to the material politics and practices of their implementation? To answer this question, I ethnographically trace the development of two data dashboards by strategic planners in an Australian city over the course of 2 years. By studying this techno-political process from its origins onward, I uncovered an interesting story of obdurate institutions, bureaucratic momentum, unexpected troubles, and, ultimately, frustration and failure. These kinds of stories, which often go untold in the annals of innovation, contrast starkly with more common framings of technological triumph and transformation. They also, I argue, reveal much more about how techno-political systems are actualised in the world…(More)”.

Open Data Standard and Analysis Framework: Towards Response Equity in Local Governments


Paper by Joy Hsu, Ramya Ravichandran, Edwin Zhang, and Christine Keung: “There is an increasing need for open data in governments and systems to analyze equity at large scale. Local governments often lack the necessary technical tools to identify and tackle inequities in their communities. Moreover, these tools may not generalize across departments and cities nor be accessible to the public. To this end, we propose a system that facilitates centralized analyses of publicly available government datasets through 1) a US Census-linked API, 2) an equity analysis playbook, and 3) an open data standard to regulate data intake and support equitable policymaking….(More)”.

The Census Mapper


Google blog: “…The U.S. Census is one of the largest data sets journalists can access. It has layers and layers of important data that can help reporters tell detailed stories about their own communities. But the challenge is sorting through that data and visualizing it in a way that helps readers understand trends and the bigger picture.

Today we’re launching a new tool to help reporters dig through all that data to find stories and embed visualizations on their sites. The Census Mapper project is an embeddable map that displays Census data at the national, state and county level, as well as census tracts. It was produced in partnership with Pitch Interactive and Big Local News, as part of the 2020 Census Co-op (supported by the Google News Initiative and in cooperation with the JSK Journalism Fellowships).

This image shows a detailed, country level view of the Census Mapper, showing arrows across the US depicting movements of people and other demographic information from the Census

Census Mapper shows where populations have grown over time.

The Census data is pulled from the data collected and processed by The Associated Press, one of the Census Co-op partners. Census Mapper then lets local journalists easily embed maps showing population change at any level, helping them tell powerful stories in a more visual way about their communities.

This image shows changing demographic data from North Carolina, with arrows showing different movements around the state.

With the tool, you can zoom into states and below, such as North Carolina, shown here.

As part of our investment in data journalism we’re also making improvements to our Common Knowledge Project, a data explorer and visual journalism project to allow US journalists to explore local data. Built with journalists for journalists, the new version of Common Knowledge integrates journalist feedback and new features including geographic comparisons, new charts and visuals…(More)”.

Open science, data sharing and solidarity: who benefits?


Report by Ciara Staunton et al: “Research, innovation, and progress in the life sciences are increasingly contingent on access to large quantities of data. This is one of the key premises behind the “open science” movement and the global calls for fostering the sharing of personal data, datasets, and research results. This paper reports on the outcomes of discussions by the panel “Open science, data sharing and solidarity: who benefits?” held at the 2021 Biennial conference of the International Society for the History, Philosophy, and Social Studies of Biology (ISHPSSB), and hosted by Cold Spring Harbor Laboratory (CSHL)….(More)”.

Open health data: Mapping the ecosystem


Paper by Roel Heijlen and Joep Crompvoets: “Governments around the world own multiple datasets related to the policy domain of health. Datasets range from vaccination rates to the availability of health care practitioners in a region to the outcomes of certain surgeries. Health is believed to be a promising subject in the case of open government data policies. However, the specific properties of health data such as its sensibilities regarding privacy, ethics, and ownership encompass particular conditions either enabling or preventing datasets to become freely and easily accessible for everyone…

This paper aims to map the ecosystem of open health data. By analyzing the foundations of health data and the commonalities of open data ecosystems via literature analysis, the socio-technical environment in which health data managed by governments are opened up or potentially stay closed is created. After its theoretical development, the open health data ecosystem is tested via a case study concerning the Data for Better Health initiative from the government of Belgium…

The policy domain of health includes de-identification activities, bioethical assessments, and the specific role of data providers within its open data ecosystem. However, the concept of open data does not always fully apply to the topic of health. Such several health datasets may be findable via government portals but not directly accessible. Differentiation within types of health data and data user capacities are recommendable for future research….(More)”

Under What Conditions Are Data Valuable for Development?


Paper by Dean Jolliffe et al: “Data produced by the public sector can have transformational impacts on development outcomes through better targeting of resources, improved service delivery, cost savings in policy implementation, increased accountability, and more. Around the world, the amount of data produced by the public sector is increasing at a rapid pace, yet their transformational impacts have not been realized fully. Why has the full value of these data not been realized yet? This paper outlines 12 conditions needed for the production and use of public sector data to generate value for development and presents case studies substantiating these conditions. The conditions are that data need to have adequate spatial and temporal coverage (are complete, frequent, and timely), are of high quality (are accurate, comparable, and granular), are easy to use (are accessible, understandable, and interoperable), and are safe to use (are impartial, confidential, and appropriate)…(More)”.

A Proposal for Researcher Access to Platform Data: The Platform Transparency and Accountability Act


Paper by Nathaniel Persily: “We should not need to wait for whistleblowers to blow their whistles, however, before we can understand what is actually happening on these extremely powerful digital platforms. Congress needs to act immediately to ensure that a steady stream of rigorous research reaches the public on the most pressing issues concerning digital technology. No one trusts the representations made by the platforms themselves, though, given their conflict of interest and understandable caution in releasing information that might spook shareholders. We need to develop an unprecedented system of corporate datasharing, mandated by government for independent research in the public interest.

This is easier said than done. Not only do the details matter, they are the only thing that matters. It is all well and good to call for “transparency” or “datasharing,” as an uncountable number of academics have, but the way government might setup this unprecedented regime will determine whether it can serve the grandiose purposes techcritics hope it will….(More)”.

Giant, free index to world’s research papers released online


Holly Else at Nature: “In a project that could unlock the world’s research papers for easier computerized analysis, an American technologist has released online a gigantic index of the words and short phrases contained in more than 100 million journal articles — including many paywalled papers.

The catalogue, which was released on 7 October and is free to use, holds tables of more than 355 billion words and sentence fragments listed next to the articles in which they appear. It is an effort to help scientists use software to glean insights from published work even if they have no legal access to the underlying papers, says its creator, Carl Malamud. He released the files under the auspices of Public Resource, a non-profit corporation in Sebastopol, California, that he founded.

Malamud says that because his index doesn’t contain the full text of articles, but only sentence snippets up to five words long, releasing it does not breach publishers’ copyright restrictions on the reuse of paywalled articles. However, one legal expert says that publishers might question the legality of how Malamud created the index in the first place.

Some researchers who have had early access to the index say it’s a major development in helping them to search the literature with software — a procedure known as text mining. Gitanjali Yadav, a computational biologist at the University of Cambridge, UK, who studies volatile organic compounds emitted by plants, says she aims to comb through Malamud’s index to produce analyses of the plant chemicals described in the world’s research papers. “There is no way for me — or anyone else — to experimentally analyse or measure the chemical fingerprint of each and every plant species on Earth. Much of the information we seek already exists, in published literature,” she says. But researchers are restricted by lack of access to many papers, Yadav adds….(More)”.

Has COVID-19 been the making of Open Science?


Article by Lonni Besançon, Corentin Segalas and Clémence Leyrat: “Although many concepts fall under the umbrella of Open Science, some of its key concepts are: Open Access, Open Data, Open Source, and Open Peer Review. How far these four principles were embraced by researchers during the pandemic and where there is room for improvement, is what we, as early career researchers, set out to assess by looking at data on scientific articles published during the Covid-19 pandemic….Open Source and Open Data practices consist in making all the data and materials used to gather or analyse data available on relevant repositories. While we can find incredibly useful datasets shared publicly on COVID-19 (for instance those provided by the European Centre for Disease Control), they remain the exception rather than the norm. A spectacular example of this were the papers utilising data from the company Surgisphere, that led to retracted papers in The Lancet and The New England Journal of Medicine. In our paper, we highlight 4 papers that could have been retracted much earlier (and perhaps would never have been accepted) had the data been made accessible from the time of publication. As we argue in our paper, this presents a clear case for making open data and open source the default, with exceptions for privacy and safety. While some journals already have such policies, we go further in asking that, when data cannot be shared publicly, editors/publishers and authors/institutions should agree on a third party to check the existence and reliability/validity of the data and the results presented. This not only would strengthen the review process, but also enhance the reproducibility of research and further accelerate the production of new knowledge through data and code sharing…(More)”.