Data Commons

Paper by R. V. Guha et al: “Publicly available data from open sources (e.g., United States Census Bureau (Census), World Health Organization (WHO), Intergovernmental Panel on Climate Change (IPCC) are vital resources for policy makers, students and researchers across different disciplines. Combining data from different sources requires the user to reconcile the differences in schemas, formats, assumptions, and more. This data wrangling is time consuming, tedious and needs to be repeated by every user of the data. Our goal with Data Commons (DC) is to help make public data accessible and useful to those who want to understand this data and use it to solve societal challenges and opportunities. We do the data processing and make the processed data widely available via standard schemas and Cloud APIs. Data Commons is a distributed network of sites that publish data in a common schema and interoperate using the Data Commons APIs. Data from different Data Commons can be ‘joined’ easily. The aggregate of these Data Commons can be viewed as a single Knowledge Graph. This Knowledge Graph can then be searched over using Natural Language questions utilizing advances in Large Language Models. This paper describes the architecture of Data Commons, some of the major deployments and highlights directions for future work…(More)”.

Data Collaboratives

Policy Brief by Center for the Governance of Change: “Despite the abundance of data generated, it is becoming increasingly clear that its accessibility and advantages are not equitably or effectively distributed throughout society. Data asymmetries, driven in large part by deeply entrenched inequalities and lack of incentives by many public- and private-sector organizations to collaborate, are holding back the public good potential of data and hindering progress and innovation in key areas such as financial inclusion, health, and the future of work.

More (and better) collaboration is needed to address the data asymmetries that exist across society, but early efforts at opening data have fallen short of achieving their intended aims. In the EU, the proposed Data Act is seeking to address these shortcomings and make more data available for public use by setting up new rules on data sharing. However, critics say its current reading risks limiting the potential for delivering innovative solutions by failing to establish cross-sectoral data-sharing frameworks, leaving the issue of public data stewardship off the table, and avoiding the thorny question of business incentives.

This policy brief, based on Stefaan Verhulst’s recent policy paper for the Center for the Governance of Change, argues that data collaboratives, an emerging model of collaboration in which participants from different sectors exchange data to solve public problems, offer a promising solution to address these data asymmetries and contribute to a healthy data economy that can benefit society as a whole. However, data collaboratives require a systematic, sustainable, and responsible approach to be successful, with a particular focus on..(More):

Establishing a new science of questions, to help identify the most pressing public and private challenges that can be addressed with data sharing.Fostering a new profession of data stewards, to promote a culture of responsible sharing within organizations and recognize opportunities for productive collaboration.Clarifying incentives, to bring the private sector to the table and help operationalize data collaboration, ideally with some sort of market-led compensation model.
Establishing a social license for data reuse, to promote trust among stakeholders through public engagement, data stewardship, and an enabling regulatory framework.Becoming more data-driven about data, to improve our understanding of collaboration, build sustainable initiatives, and achieve project accountability.

Sharing Health Data: The Why, the Will, and the Way Forward.

Book edited by Grossmann C, Chua PS, Ahmed M, et al. : “Sharing health data and information1 across stakeholder groups is the bedrock of a learning health system. As data and information are increasingly combined across various sources, their generative value to transform health, health care, and health equity increases significantly. Facilitating this potential is an escalating surge of digital technologies (i.e., cloud computing, broadband and wireless solutions, digital health technologies, and application programming interfaces [APIs]) that, with each successive generation, not only enhance data sharing, but also improve in their ability to preserve privacy and identify and mitigate cybersecurity risks. These technological advances, coupled with notable policy developments, new interoperability standards (particularly the Fast Healthcare Interoperability Resources [FHIR] standard), and the launch of innovative payment models within the last decade, have resulted in a greater recognition of the value of health data sharing among patients, providers, and researchers. Consequently, a number of data sharing collaborations are emerging across the health care ecosystem.

Unquestionably, the COVID-19 pandemic has had a catalytic effect on this trend. The criticality of swift data exchange became evident at the outset of the pandemic, when the scientific community sought answers about the novel SARS-CoV-2 virus and emerging disease. Then, as the crisis intensified, data sharing graduated from a research imperative to a societal one, with a clear need to urgently share and link data across multiple sectors and industries to curb the effects of the pandemic and prevent the next one.

In spite of these evolving attitudes toward data sharing and the ubiquity of data-sharing partnerships, barriers persist. The practice of health data sharing occurs unevenly, prominent in certain stakeholder communities while absent in others. A stark contrast is observed between the volume, speed, and frequency with which health data is aggregated and linked—oftentimes with non-traditional forms of health data—for marketing purposes, and the continuing challenges patients experience in contributing data to their own health records. In addition, there are varying levels of data sharing. Not all types of data are shared in the same manner and at the same level of granularity, creating a patchwork of information. As highlighted by the gaps observed in the haphazard and often inadequate sharing of race and ethnicity data during the pandemic, the consequences can be severe—impacting the allocation of much-needed resources and attention to marginalized communities. Therefore, it is important to recognize the value of data sharing in which stakeholder participation is equitable and comprehensive— not only for achieving a future ideal state in health care, but also for redressing long-standing inequities…(More)”

Wastewater monitoring: ‘the James Webb Telescope for population health’

Article by Exemplars News: “When the COVID-19 pandemic triggered a lockdown across Bangladesh and her research on environmental exposure to heavy metals became impossible to continue, Dr. Rehnuma Haque began a search for some way she could contribute to the pandemic response.

“I knew I had to do something during COVID,” said Dr. Haque, a research scientist at the International Centre for Diarrheal Disease Research, Bangladesh (icddr,b). “I couldn’t just sit at home.”

Then she stumbled upon articles on early wastewater monitoring efforts for COVID in Australia, the NetherlandsItaly, and the United States. “When I read those papers, I was so excited,” said Dr. Haque. “I emailed my supervisor, Dr. Mahbubur Rahman, and said, ‘Can we do this?’”

Two months later, in June 2020, Dr. Haque and her colleagues had launched one of the most robust and earliest national wastewater surveillance programs for COVID in a low- or middle-income country (LMIC).

The initiative, which has now been expanded to monitor for cholera, salmonella, and rotavirus and may soon be expanded further to monitor for norovirus and antibiotic resistance, demonstrates the power and potential of wastewater surveillance to serve as a low-cost tool for obtaining real-time meaningful health data at scale to identify emerging risks and guide public health responses.

“It is improving public health outcomes,” said Dr. Haque. “We can see everything going on in the community through wastewater surveillance. You can find everything you are looking for and then prepare a response.”

A single wastewater sample can yield representative data about an entire ward, town, or county and allow LMICs to monitor for emerging pathogens. Compared with clinical monitoring, wastewater monitoring is easier and cheaper to collect, can capture infections that are asymptomatic or before symptoms arise, raises fewer ethical concerns, can be more inclusive and not as prone to sampling biases, can generate a broader range of data, and is unrivaled at quickly generating population-level data…(More)” – See also: The #Data4Covid19 Review

Mapping the landscape of data intermediaries

Report by the European Commission’s Joint Research Centre: “…provides a landscape analysis of key emerging types of data intermediaries. It reviews and syntheses current academic and policy literature, with the goal of identifying shared elements and definitions. An overall objective is to contribute to establishing a common vocabulary among EU policy makers, experts, and practitioners. Six types are presented in detail: personal information management systems (PIMS), data cooperatives, data trusts, data unions, data marketplaces, and data sharing pools. For each one, the report provides information about how it works, its main features, key examples, and business model considerations. The report is grounded in multiple perspectives from sociological, legal, and economic disciplines. The analysis is informed by the notion of inclusive data governance, contextualised in the recent EU Data Governance Act, and problematised according to the economic literature on business models.

The findings highlight the fragmentation and heterogeneity of the field. Data intermediaries range from individualistic and business-oriented types to more collective and inclusive models that support greater engagement in data governance, while certain types do aim at facilitating economic transactions between data holders and users, others mainly seek to produce collective benefits or public value. In the conclusions, it derives a series of take-aways regarding main obstacles faced by data intermediaries and identifies lines of empirical work in this field…(More)”.

Incentivising open ecological data using blockchain technology

Paper by Robert John Lewis, Kjell-Erik Marstein & John-Arvid Grytnes: “Mindsets concerning data as proprietary are common, especially where data production is resource intensive. Fears of competing research in concert with loss of exclusivity to hard earned data are pervasive. This is for good reason given that current reward structures in academia focus overwhelmingly on journal prestige and high publication counts, and not accredited publication of open datasets. And, then there exists reluctance of researchers to cede control to centralised repositories, citing concern over the lack of trust and transparency over the way complex data are used and interpreted.

To begin to resolve these cultural and sociological constraints to open data sharing, we as a community must recognise that top-down pressure from policy alone is unlikely to improve the state of ecological data availability and accessibility. Open data policy is almost ubiquitous (e.g. the Joint Data Archiving Policy, (JDAP) and while cyber-infrastructures are becoming increasingly extensive, most have coevolved with sub-disciplines utilising high velocity, born digital data (e.g. remote sensing, automated sensor networks and citizen science). Consequently, they do not always offer technological solutions that ease data collation, standardisation, management and analytics, nor provide a good fit culturally to research communities working among the long-tail of ecological science, i.e. science conducted by many individual researchers/teams over limited spatial and temporal scales. Given the majority of scientific funding is spent on this type of dispersed research, there is a surprisingly large disconnect between the vast majority of ecological science and the cyber-infrastructures to support open data mandates, offering a possible explanation to why primary ecological data are reportedly difficult to find…(More)”.

Private sector access to public sector personal data: exploring data value and benefit sharing

Literature review for the Scottish Government: “The aim of this review is to enable the Scottish Government to explore the issues relevant to the access of public sector personal data (as defined by the European Union General Data Protection Regulation, GDPR) with or by the private sector in publicly trusted ways, to unlock the public benefit of this data. This literature review will specifically enable the Scottish Government to establish whether there are

(I) models/approaches of costs/benefits/data value/benefit-sharing, and

(II) intellectual property rights or royalties schemes regarding the use of public sector personal data with or by the private sector both in the UK and internationally.

In conducting this literature review, we used an adapted systematic review, and undertook thematic analysis of the included literature to answer several questions central to the aim of this research. Such questions included:

  • Are there any models of costs and/or benefits regarding the use of public sector personal data with or by the private sector?
  • Are there any models of valuing data regarding the use of public sector personal data with or by the private sector?
  • Are there any models for benefit-sharing in respect of the use of public sector personal data with or by the private sector?
  • Are there any models in respect of the use of intellectual property rights or royalties regarding the use of public sector personal data with or by the private sector?..(More)”.

Unlocking the value of supply chain data across industries

MIT Technology Review Insights: “The product shortages and supply-chain delays of the global covid-19 pandemic are still fresh memories. Consumers and industry are concerned that the next geopolitical climate event may have a similar impact. Against a backdrop of evolving regulations, these conditions mean manufacturers want to be prepared against short supplies, concerned customers, and weakened margins.

For supply chain professionals, achieving a “phygital” information flow—the blending of physical and digital data—is key to unlocking resilience and efficiency. As physical objects travel through supply chains, they generate a rich flow of data about the item and its journey—from its raw materials, its manufacturing conditions, even its expiration date—bringing new visibility and pinpointing bottlenecks.

This phygital information flow offers significant advantages, enhancing the ability to create rich customer experiences to satisfying environmental, social, and corporate governance (ESG) goals. In a 2022 EY global survey of executives, 70% of respondents agreed that a sustainable supply chain will increase their company’s revenue.

For disparate parties to exchange product information effectively, they require a common framework and universally understood language. Among supply chain players, data standards create a shared foundation. Standards help uniquely identify, accurately capture, and automatically share critical information about products, locations, and assets across trading communities…(More)”.

Toward a 21st Century National Data Infrastructure: Enhancing Survey Programs by Using Multiple Data Sources

Report by National Academies of Sciences, Engineering, and Medicine: “Much of the statistical information currently produced by federal statistical agencies – information about economic, social, and physical well-being that is essential for the functioning of modern society – comes from sample surveys. In recent years, there has been a proliferation of data from other sources, including data collected by government agencies while administering programs, satellite and sensor data, private-sector data such as electronic health records and credit card transaction data, and massive amounts of data available on the internet. How can these data sources be used to enhance the information currently collected on surveys, and to provide new frontiers for producing information and statistics to benefit American society?…(More)”.

Valuing Data: The Role of Satellite Data in Halting the Transmission of Polio in Nigeria

Article by Mariel Borowitz, Janet Zhou, Krystal Azelton & Isabelle-Yara Nassar: “There are more than 1,000 satellites in orbit right now collecting data about what’s happening on the Earth. These include government and commercial satellites that can improve our understanding of climate change; monitor droughts, floods, and forest fires; examine global agricultural output; identify productive locations for fishing or mining; and many other purposes. We know the data provided by these satellites is important, yet it is very difficult to determine the exact value that each of these systems provides. However, with only a vague sense of “value,” it is hard for policymakers to ensure they are making the right investments in Earth observing satellites.

NASA’s Consortium for the Valuation of Applications Benefits Linked with Earth Science (VALUABLES), carried out in collaboration with Resources for the Future, aimed to address this by analyzing specific use cases of satellite data to determine their monetary value. VALUABLES proposed a “value of information” approach focusing on cases in which satellite data informed a specific decision. Researchers could then compare the outcome of that decision with what would have occurredif no satellite data had been available. Our project, which was funded under the VALUABLES program, examined how satellite data contributed to efforts to halt the transmission of Polio in Nigeria…(More)”