Paper by the Actionable Intelligence for Social Policy Center: “This paper presents a Four Question Framework to guide data integration partners in building a strong governance and legal foundation to support ethical data use. While this framework was developed based on work in the United States that routinely integrates public data, it is meant to be a simple, digestible tool that can be adapted to any context. The framework was developed through a series of public deliberation workgroups and 15 years of field experience working with a diversity of data integration efforts across the United States.
The Four Questions – Is this legal? Is this ethical? Is this a good idea? How do we know (and who decides)? – should be considered within an established data governance framework and alongside core partners to determine whether and how to move forward when building an Integrated Data System (IDS) and also at each stage of a specific data project. We discuss these questions in depth, with a particular focus on the role of governance in establishing legal and ethical data use. In addition, we provide example data governance structures from two IDS sites and hypothetical scenarios that illustrate key considerations for the Four Question Framework.
A robust governance process is essential for determining whether data sharing and integration is legal, ethical, and a good idea within the local context. This process is iterative and as relational as it is technical, which means authentic collaboration across partners should be prioritized at each stage of a data use project. The Four Questions serve as a guide for determining whether to undertake data sharing and integration and should be regularly revisited throughout the life of a project…(More)”.
Can Google Trends predict asylum-seekers’ destination choices?
Paper by Haodong Qi & Tuba Bircan: “Google Trends (GT) collate the volumes of search keywords over time and by geographical location. Such data could, in theory, provide insights into people’s ex ante intentions to migrate, and hence be useful for predictive analysis of future migration. Empirically, however, the predictive power of GT is sensitive, it may vary depending on geographical context, the search keywords selected for analysis, as well as Google’s market share and its users’ characteristics and search behavior, among others. Unlike most previous studies attempting to demonstrate the benefit of using GT for forecasting migration flows, this article addresses a critical but less discussed issue: when GT cannot enhance the performances of migration models. Using EUROSTAT statistics on first-time asylum applications and a set of push-pull indicators gathered from various data sources, we train three classes of gravity models that are commonly used in the migration literature, and examine how the inclusion of GT may affect models’ abilities to predict refugees’ destination choices. The results suggest that the effects of including GT are highly contingent on the complexity of different models. Specifically, GT can only improve the performance of relatively simple models, but not of those augmented by flow Fixed-Effects or by Auto-Regressive effects. These findings call for a more comprehensive analysis of the strengths and limitations of using GT, as well as other digital trace data, in the context of modeling and forecasting migration. It is our hope that this nuanced perspective can spur further innovations in the field, and ultimately bring us closer to a comprehensive modeling framework of human migration…(More)”.
Essential requirements for the governance and management of data trusts, data repositories, and other data collaborations
Paper by Alison Paprica et al: “Around the world, many organisations are working on ways to increase the use, sharing, and reuse of person-level data for research, evaluation, planning, and innovation while ensuring that data are secure and privacy is protected. As a contribution to broader efforts to improve data governance and management, in 2020 members of our team published 12 minimum specification essential requirements (min specs) to provide practical guidance for organisations establishing or operating data trusts and other forms of data infrastructure… We convened an international team, consisting mostly of participants from Canada and the United States of America, to test and refine the original 12 min specs. Twenty-three (23) data-focused organisations and initiatives recorded the various ways they address the min specs. Sub-teams analysed the results, used the findings to make improvements to the min specs, and identified materials to support organisations/initiatives in addressing the min specs.
Analyses and discussion led to an updated set of 15 min specs covering five categories: one min spec for Legal, five for Governance, four for Management, two for Data Users, and three for Stakeholder & Public Engagement. Multiple changes were made to make the min specs language more technically complete and precise. The updated set of 15 min specs has been integrated into a Canadian national standard that, to our knowledge, is the first to include requirements for public engagement and Indigenous Data Sovereignty…(More)”.
Data Commons
Paper by R. V. Guha et al: “Publicly available data from open sources (e.g., United States Census Bureau (Census), World Health Organization (WHO), Intergovernmental Panel on Climate Change (IPCC) are vital resources for policy makers, students and researchers across different disciplines. Combining data from different sources requires the user to reconcile the differences in schemas, formats, assumptions, and more. This data wrangling is time consuming, tedious and needs to be repeated by every user of the data. Our goal with Data Commons (DC) is to help make public data accessible and useful to those who want to understand this data and use it to solve societal challenges and opportunities. We do the data processing and make the processed data widely available via standard schemas and Cloud APIs. Data Commons is a distributed network of sites that publish data in a common schema and interoperate using the Data Commons APIs. Data from different Data Commons can be ‘joined’ easily. The aggregate of these Data Commons can be viewed as a single Knowledge Graph. This Knowledge Graph can then be searched over using Natural Language questions utilizing advances in Large Language Models. This paper describes the architecture of Data Commons, some of the major deployments and highlights directions for future work…(More)”.
Data Collaboratives
Policy Brief by Center for the Governance of Change: “Despite the abundance of data generated, it is becoming increasingly clear that its accessibility and advantages are not equitably or effectively distributed throughout society. Data asymmetries, driven in large part by deeply entrenched inequalities and lack of incentives by many public- and private-sector organizations to collaborate, are holding back the public good potential of data and hindering progress and innovation in key areas such as financial inclusion, health, and the future of work.
More (and better) collaboration is needed to address the data asymmetries that exist across society, but early efforts at opening data have fallen short of achieving their intended aims. In the EU, the proposed Data Act is seeking to address these shortcomings and make more data available for public use by setting up new rules on data sharing. However, critics say its current reading risks limiting the potential for delivering innovative solutions by failing to establish cross-sectoral data-sharing frameworks, leaving the issue of public data stewardship off the table, and avoiding the thorny question of business incentives.
This policy brief, based on Stefaan Verhulst’s recent policy paper for the Center for the Governance of Change, argues that data collaboratives, an emerging model of collaboration in which participants from different sectors exchange data to solve public problems, offer a promising solution to address these data asymmetries and contribute to a healthy data economy that can benefit society as a whole. However, data collaboratives require a systematic, sustainable, and responsible approach to be successful, with a particular focus on..(More):
Establishing a new science of questions, to help identify the most pressing public and private challenges that can be addressed with data sharing. | Fostering a new profession of data stewards, to promote a culture of responsible sharing within organizations and recognize opportunities for productive collaboration. | Clarifying incentives, to bring the private sector to the table and help operationalize data collaboration, ideally with some sort of market-led compensation model. |
Establishing a social license for data reuse, to promote trust among stakeholders through public engagement, data stewardship, and an enabling regulatory framework. | Becoming more data-driven about data, to improve our understanding of collaboration, build sustainable initiatives, and achieve project accountability. |
Sharing Health Data: The Why, the Will, and the Way Forward.
Book edited by Grossmann C, Chua PS, Ahmed M, et al. : “Sharing health data and information1 across stakeholder groups is the bedrock of a learning health system. As data and information are increasingly combined across various sources, their generative value to transform health, health care, and health equity increases significantly. Facilitating this potential is an escalating surge of digital technologies (i.e., cloud computing, broadband and wireless solutions, digital health technologies, and application programming interfaces [APIs]) that, with each successive generation, not only enhance data sharing, but also improve in their ability to preserve privacy and identify and mitigate cybersecurity risks. These technological advances, coupled with notable policy developments, new interoperability standards (particularly the Fast Healthcare Interoperability Resources [FHIR] standard), and the launch of innovative payment models within the last decade, have resulted in a greater recognition of the value of health data sharing among patients, providers, and researchers. Consequently, a number of data sharing collaborations are emerging across the health care ecosystem.
Unquestionably, the COVID-19 pandemic has had a catalytic effect on this trend. The criticality of swift data exchange became evident at the outset of the pandemic, when the scientific community sought answers about the novel SARS-CoV-2 virus and emerging disease. Then, as the crisis intensified, data sharing graduated from a research imperative to a societal one, with a clear need to urgently share and link data across multiple sectors and industries to curb the effects of the pandemic and prevent the next one.
In spite of these evolving attitudes toward data sharing and the ubiquity of data-sharing partnerships, barriers persist. The practice of health data sharing occurs unevenly, prominent in certain stakeholder communities while absent in others. A stark contrast is observed between the volume, speed, and frequency with which health data is aggregated and linked—oftentimes with non-traditional forms of health data—for marketing purposes, and the continuing challenges patients experience in contributing data to their own health records. In addition, there are varying levels of data sharing. Not all types of data are shared in the same manner and at the same level of granularity, creating a patchwork of information. As highlighted by the gaps observed in the haphazard and often inadequate sharing of race and ethnicity data during the pandemic, the consequences can be severe—impacting the allocation of much-needed resources and attention to marginalized communities. Therefore, it is important to recognize the value of data sharing in which stakeholder participation is equitable and comprehensive— not only for achieving a future ideal state in health care, but also for redressing long-standing inequities…(More)”
Wastewater monitoring: ‘the James Webb Telescope for population health’
Article by Exemplars News: “When the COVID-19 pandemic triggered a lockdown across Bangladesh and her research on environmental exposure to heavy metals became impossible to continue, Dr. Rehnuma Haque began a search for some way she could contribute to the pandemic response.
“I knew I had to do something during COVID,” said Dr. Haque, a research scientist at the International Centre for Diarrheal Disease Research, Bangladesh (icddr,b). “I couldn’t just sit at home.”
Then she stumbled upon articles on early wastewater monitoring efforts for COVID in Australia, the Netherlands, Italy, and the United States. “When I read those papers, I was so excited,” said Dr. Haque. “I emailed my supervisor, Dr. Mahbubur Rahman, and said, ‘Can we do this?’”
Two months later, in June 2020, Dr. Haque and her colleagues had launched one of the most robust and earliest national wastewater surveillance programs for COVID in a low- or middle-income country (LMIC).
The initiative, which has now been expanded to monitor for cholera, salmonella, and rotavirus and may soon be expanded further to monitor for norovirus and antibiotic resistance, demonstrates the power and potential of wastewater surveillance to serve as a low-cost tool for obtaining real-time meaningful health data at scale to identify emerging risks and guide public health responses.
“It is improving public health outcomes,” said Dr. Haque. “We can see everything going on in the community through wastewater surveillance. You can find everything you are looking for and then prepare a response.”
A single wastewater sample can yield representative data about an entire ward, town, or county and allow LMICs to monitor for emerging pathogens. Compared with clinical monitoring, wastewater monitoring is easier and cheaper to collect, can capture infections that are asymptomatic or before symptoms arise, raises fewer ethical concerns, can be more inclusive and not as prone to sampling biases, can generate a broader range of data, and is unrivaled at quickly generating population-level data…(More)” – See also: The #Data4Covid19 Review
Mapping the landscape of data intermediaries
Report by the European Commission’s Joint Research Centre: “…provides a landscape analysis of key emerging types of data intermediaries. It reviews and syntheses current academic and policy literature, with the goal of identifying shared elements and definitions. An overall objective is to contribute to establishing a common vocabulary among EU policy makers, experts, and practitioners. Six types are presented in detail: personal information management systems (PIMS), data cooperatives, data trusts, data unions, data marketplaces, and data sharing pools. For each one, the report provides information about how it works, its main features, key examples, and business model considerations. The report is grounded in multiple perspectives from sociological, legal, and economic disciplines. The analysis is informed by the notion of inclusive data governance, contextualised in the recent EU Data Governance Act, and problematised according to the economic literature on business models.
The findings highlight the fragmentation and heterogeneity of the field. Data intermediaries range from individualistic and business-oriented types to more collective and inclusive models that support greater engagement in data governance, while certain types do aim at facilitating economic transactions between data holders and users, others mainly seek to produce collective benefits or public value. In the conclusions, it derives a series of take-aways regarding main obstacles faced by data intermediaries and identifies lines of empirical work in this field…(More)”.
Incentivising open ecological data using blockchain technology
Paper by Robert John Lewis, Kjell-Erik Marstein & John-Arvid Grytnes: “Mindsets concerning data as proprietary are common, especially where data production is resource intensive. Fears of competing research in concert with loss of exclusivity to hard earned data are pervasive. This is for good reason given that current reward structures in academia focus overwhelmingly on journal prestige and high publication counts, and not accredited publication of open datasets. And, then there exists reluctance of researchers to cede control to centralised repositories, citing concern over the lack of trust and transparency over the way complex data are used and interpreted.
To begin to resolve these cultural and sociological constraints to open data sharing, we as a community must recognise that top-down pressure from policy alone is unlikely to improve the state of ecological data availability and accessibility. Open data policy is almost ubiquitous (e.g. the Joint Data Archiving Policy, (JDAP) http://datadryad.org/pages/jdap) and while cyber-infrastructures are becoming increasingly extensive, most have coevolved with sub-disciplines utilising high velocity, born digital data (e.g. remote sensing, automated sensor networks and citizen science). Consequently, they do not always offer technological solutions that ease data collation, standardisation, management and analytics, nor provide a good fit culturally to research communities working among the long-tail of ecological science, i.e. science conducted by many individual researchers/teams over limited spatial and temporal scales. Given the majority of scientific funding is spent on this type of dispersed research, there is a surprisingly large disconnect between the vast majority of ecological science and the cyber-infrastructures to support open data mandates, offering a possible explanation to why primary ecological data are reportedly difficult to find…(More)”.
Private sector access to public sector personal data: exploring data value and benefit sharing
Literature review for the Scottish Government: “The aim of this review is to enable the Scottish Government to explore the issues relevant to the access of public sector personal data (as defined by the European Union General Data Protection Regulation, GDPR) with or by the private sector in publicly trusted ways, to unlock the public benefit of this data. This literature review will specifically enable the Scottish Government to establish whether there are
(I) models/approaches of costs/benefits/data value/benefit-sharing, and
(II) intellectual property rights or royalties schemes regarding the use of public sector personal data with or by the private sector both in the UK and internationally.
In conducting this literature review, we used an adapted systematic review, and undertook thematic analysis of the included literature to answer several questions central to the aim of this research. Such questions included:
- Are there any models of costs and/or benefits regarding the use of public sector personal data with or by the private sector?
- Are there any models of valuing data regarding the use of public sector personal data with or by the private sector?
- Are there any models for benefit-sharing in respect of the use of public sector personal data with or by the private sector?
- Are there any models in respect of the use of intellectual property rights or royalties regarding the use of public sector personal data with or by the private sector?..(More)”.