Genomic Epidemiology Data Infrastructure Needs for SARS-CoV-2


Report by the National Academies of Sciences, Engineering, and Medicine: “In December 2019, new cases of severe pneumonia were first detected in Wuhan, China, and the cause was determined to be a novel beta coronavirus related to the severe acute respiratory syndrome (SARS) coronavirus that emerged from a bat reservoir in 2002. Within six months, this new virus—SARS coronavirus 2 (SARS-CoV-2)—has spread worldwide, infecting at least 10 million people with an estimated 500,000 deaths. COVID-19, the disease caused by SARS-CoV-2, was declared a public health emergency of international concern on January 30, 2020 by the World Health Organization (WHO) and a pandemic on March 11, 2020. To date, there is no approved effective treatment or vaccine for COVID-19, and it continues to spread in many countries.

Genomic Epidemiology Data Infrastructure Needs for SARS-CoV-2: Modernizing Pandemic Response Strategies lays out a framework to define and describe the data needs for a system to track and correlate viral genome sequences with clinical and epidemiological data. Such a system would help ensure the integration of data on viral evolution with detection, diagnostic, and countermeasure efforts. This report also explores data collection mechanisms to ensure a representative global sample set of all relevant extant sequences and considers challenges and opportunities for coordination across existing domestic, global, and regional data sources….(More)”.

Public perceptions on data sharing: key insights from the UK and the USA


Paper by Saira Ghafur, Jackie Van Dael, Melanie Leis and Ara Darzi, and Aziz Sheikh: “Data science and artificial intelligence (AI) have the potential to transform the delivery of health care. Health care as a sector, with all of the longitudinal data it holds on patients across their lifetimes, is positioned to take advantage of what data science and AI have to offer. The current COVID-19 pandemic has shown the benefits of sharing data globally to permit a data-driven response through rapid data collection, analysis, modelling, and timely reporting.

Despite its obvious advantages, data sharing is a controversial subject, with researchers and members of the public justifiably concerned about how and why health data are shared. The most common concern is privacy; even when data are (pseudo-)anonymised, there remains a risk that a malicious hacker could, using only a few datapoints, re-identify individuals. For many, it is often unclear whether the risks of data sharing outweigh the benefits.

A series of surveys over recent years indicate that the public holds a range of views about data sharing. Over the past few years, there have been several important data breaches and cyberattacks. This has resulted in patients and the public questioning the safety of their data, including the prospect or risk of their health data being shared with unauthorised third parties.

We surveyed people across the UK and the USA to examine public attitude towards data sharing, data access, and the use of AI in health care. These two countries were chosen as comparators as both are high-income countries that have had substantial national investments in health information technology (IT) with established track records of using data to support health-care planning, delivery, and research. The UK and USA, however, have sharply contrasting models of health-care delivery, making it interesting to observe if these differences affect public attitudes.

Willingness to share anonymised personal health information varied across receiving bodies (figure). The more commercial the purpose of the receiving institution (eg, for an insurance or tech company), the less often respondents were willing to share their anonymised personal health information in both the UK and the USA. Older respondents (≥35 years) in both countries were generally less likely to trust any organisation with their anonymised personal health information than younger respondents (<35 years)…

Despite the benefits of big data and technology in health care, our findings suggest that the rapid development of novel technologies has been received with concern. Growing commodification of patient data has increased awareness of the risks involved in data sharing. There is a need for public standards that secure regulation and transparency of data use and sharing and support patient understanding of how data are used and for what purposes….(More)”.

Project Patient Voice


Press Release: “The U.S. Food and Drug Administration today launched Project Patient Voice, an initiative of the FDA’s Oncology Center of Excellence (OCE). Through a new website, Project Patient Voice creates a consistent source of publicly available information describing patient-reported symptoms from cancer trials for marketed treatments. While this patient-reported data has historically been analyzed by the FDA during the drug approval process, it is rarely included in product labeling and, therefore, is largely inaccessible to the public.

“Project Patient Voice has been initiated by the Oncology Center of Excellence to give patients and health care professionals unique information on symptomatic side effects to better inform their treatment choices,” said FDA Principal Deputy Commissioner Amy Abernethy, M.D., Ph.D. “The Project Patient Voice pilot is a significant step in advancing a patient-centered approach to oncology drug development. Where patient-reported symptom information is collected rigorously, this information should be readily available to patients.” 

Patient-reported outcome (PRO) data is collected using questionnaires that patients complete during clinical trials. These questionnaires are designed to capture important information about disease- or treatment-related symptoms. This includes how severe or how often a symptom or side effect occurs.

Patient-reported data can provide additional, complementary information for health care professionals to discuss with patients, specifically when discussing the potential side effects of a particular cancer treatment. In contrast to the clinician-reported safety data in product labeling, the data in Project Patient Voice is obtained directly from patients and can show symptoms before treatment starts and at multiple time points while receiving cancer treatment. 

The Project Patient Voice website will include a list of cancer clinical trials that have available patient-reported symptom data. Each trial will include a table of the patient-reported symptoms collected. Each patient-reported symptom can be selected to display a series of bar and pie charts describing the patient-reported symptom at baseline (before treatment starts) and over the first 6 months of treatment. This information provides insights into side effects not currently available in standard FDA safety tables, including existing symptoms before the start of treatment, symptoms over time, and the subset of patients who did not have a particular symptom prior to starting treatment….(More)”.

Measuring Movement and Social Contact with Smartphone Data: A Real-Time Application to Covid-19


Paper by Victor Couture et al: “Tracking human activity in real time and at fine spatial scale is particularly valuable during episodes such as the COVID-19 pandemic. In this paper, we discuss the suitability of smartphone data for quantifying movement and social contact. We show that these data cover broad sections of the US population and exhibit movement patterns similar to conventional survey data. We develop and make publicly available a location exposure index that summarizes county-to-county movements and a device exposure index that quantifies social contact within venues. We use these indices to document how pandemic-induced reductions in activity vary across people and places….(More)”.

Adolescent Mental Health: Using A Participatory Mapping Methodology to Identify Key Priorities for Data Collaboration


Blog by Alexandra Shaw, Andrew J. Zahuranec, Andrew Young, Stefaan G. Verhulst, Jennifer Requejo, Liliana Carvajal: “Adolescence is a unique stage of life. The brain undergoes rapid development; individuals face new experiences, relationships, and environments. These events can be exciting, but they can also be a source of instability and hardship. Half of all mental conditions manifest by early adolescence and between 10 and 20 percent of all children and adolescents report mental health conditions. Despite the increased risks and concerns for adolescents’ well-being, there remain significant gaps in availability of data at the country level for policymakers to address these issues.

In June, The GovLab partnered with colleagues at UNICEF’s Health and HIV team in the Division of Data, Analysis, Planning & Monitoring and the Data for Children Collaborative — a collaboration between UNICEF, the Scottish Government, and the University of Edinburgh — to design and apply a new methodology of participatory mapping and prioritization of key topics and issues associated with adolescent mental health that could be addressed through enhanced data collaboration….

The event led to three main takeaways. First, the topic mapping allows participants to deliberate and prioritize the various aspects of adolescent mental health in a more holistic manner. Unlike the “blind men and the elephant” parable, a topic map allows the participants to see and discuss  the interrelated parts of the topic, including those which they might be less familiar with.

Second, the workshops demonstrated the importance of tapping into distributed expertise via participatory processes. While the topic map provided a starting point, the inclusion of various experts allowed the findings of the document to be reviewed in a rapid, legitimate fashion. The diverse inputs helped ensure the individual aspects could be prioritized without a perspective being ignored.

Lastly, the approach showed the importance of data initiatives being driven and validated by those individuals representing the demand. By soliciting the input of those who would actually use the data, the methodology ensured data initiatives focused on the aspects thought to be most relevant and of greatest importance….(More)”

Addressing trust in public sector data use


Centre for Data Ethics and Innovation: “Data sharing is fundamental to effective government and the running of public services. But it is not an end in itself. Data needs to be shared to drive improvements in service delivery and benefit citizens. For this to happen sustainably and effectively, public trust in the way data is shared and used is vital. Without such trust, the government and wider public sector risks losing society’s consent, setting back innovation as well as the smooth running of public services. Maximising the benefits of data driven technology therefore requires a solid foundation of societal approval.

AI and data driven technology offer extraordinary potential to improve decision making and service delivery in the public sector – from improved diagnostics to more efficient infrastructure and personalised public services. This makes effective use of data more important than it has ever been, and requires a step-change in the way data is shared and used. Yet sharing more data also poses risks and challenges to current governance arrangements.

The only way to build trust sustainably is to operate in a trustworthy way. Without adequate safeguards the collection and use of personal data risks changing power relationships between the citizen and the state. Insights derived by big data and the matching of different data sets can also undermine individual privacy or personal autonomy. Trade-offs are required which reflect democratic values, wider public acceptability and a shared vision of a data driven society. CDEI has a key role to play in exploring this challenge and setting out how it can be addressed. This report identifies barriers to data sharing, but focuses on building and sustaining the public trust which is vital if society is to maximise the benefits of data driven technology.

There are many areas where the sharing of anonymised and identifiable personal data by the public sector already improves services, prevents harm, and benefits the public. Over the last 20 years, different governments have adopted various measures to increase data sharing, including creating new legal sharing gateways. However, despite efforts to increase the amount of data sharing across the government, and significant successes in areas like open data, data sharing continues to be challenging and resource-intensive. This report identifies a range of technical, legal and cultural barriers that can inhibit data sharing.

Barriers to data sharing in the public sector

Technical barriers include limited adoption of common data standards and inconsistent security requirements across the public sector. Such inconsistency can prevent data sharing, or increase the cost and time for organisations to finalise data sharing agreements.

While there are often pre-existing legal gateways for data sharing, underpinned by data protection legislation, there is still a large amount of legal confusion on the part of public sector bodies wishing to share data which can cause them to start from scratch when determining legality and commit significant resources to legal advice. It is not unusual for the development of data sharing agreements to delay the projects for which the data is intended. While the legal scrutiny of data sharing arrangements is an important part of governance, improving the efficiency of these processes – without sacrificing their rigour – would allow data to be shared more quickly and at less expense.

Even when legal, the permissive nature of many legal gateways means significant cultural and organisational barriers to data sharing remain. Individual departments and agencies decide whether or not to share the data they hold and may be overly risk averse. Data sharing may not be prioritised by a department if it would require them to bear costs to deliver benefits that accrue elsewhere (i.e. to those gaining access to the data). Departments sharing data may need to invest significant resources to do so, as well as considering potential reputational or legal risks. This may hold up progress towards finding common agreement on data sharing. When there is an absence of incentives, even relatively small obstacles may mean data sharing is not deemed worthwhile by those who hold the data – despite the fact that other parts of the public sector might benefit significantly….(More)”.

Ethical and Legal Aspects of Open Data Affecting Farmers


Report by Foteini Zampati et al: “Open Data offers a great potential for innovations from which the agricultural sector can benefit decisively due to a wide range of possibilities for further use. However, there are many inter-linked issues in the whole data value chain that affect the ability of farmers, especially the poorest and most vulnerable, to access, use and harness the benefits of data and data-driven technologies.

There are technical challenges and ethical and legal challenges as well. Of all these challenges, the ethical and legal aspects related to accessing and using data by the farmers and sharing farmers’ data have been less explored.

We aimed to identify gaps and highlight the often-complex legal issues related to open data in the areas of law (e.g. data ownership, data rights) policies, codes of conduct, data protection, intellectual property rights, licensing contracts and personal privacy.

This report is an output of the Kampala INSPIRE Hackathon 2020. The Hackathon addressed key topics identified by the IST-Africa 2020 conference, such as: Agriculture, environmental sustainability, collaborative open innovation, and ICT-enabled entrepreneurship.

The goal of the event was to continue to build on the efforts of the 2019 Nairobi INSPIRE Hackathon, further strengthening relationships between various EU projects and African communities. It was a successful event, with more than 200 participants representing 26 African countries. The INSPIRE Hackathons are not a competition, rather the main focus is building relationships, making rapid developments, and collecting ideas for future research and innovation….(More)”.

Responsible, practical genomic data sharing that accelerates research


James Brian Byrd, Anna C. Greene, Deepashree Venkatesh Prasad, Xiaoqian Jiang & Casey S. Greene in Nature: “Data sharing anchors reproducible science, but expectations and best practices are often nebulous. Communities of funders, researchers and publishers continue to grapple with what should be required or encouraged. To illuminate the rationales for sharing data, the technical challenges and the social and cultural challenges, we consider the stakeholders in the scientific enterprise. In biomedical research, participants are key among those stakeholders. Ethical sharing requires considering both the value of research efforts and the privacy costs for participants. We discuss current best practices for various types of genomic data, as well as opportunities to promote ethical data sharing that accelerates science by aligning incentives….(More)”.

Preservation of Individuals’ Privacy in Shared COVID-19 Related Data


Paper by Stefan Sauermann et al: “This paper provides insight into how restricted data can be incorporated in an open-be-default-by-design digital infrastructure for scientific data. We focus, in particular, on the ethical component of FAIRER (Findable, Accessible, Interoperable, Ethical, and Reproducible) data, and the pseudo-anonymization and anonymization of COVID-19 datasets to protect personally identifiable information (PII). First we consider the need for the customisation of the existing privacy preservation techniques in the context of rapid production, integration, sharing and analysis of COVID-19 data. Second, the methods for the pseudo-anonymization of direct identification variables are discussed. We also discuss different pseudo-IDs of the same person for multi-domain and multi-organization. Essentially, pseudo-anonymization and its encrypted domain specific IDs are used to successfully match data later, if required and permitted, as well as to restore the true ID (and authenticity) in individual cases of a patient’s clarification.Third, we discuss application of statistical disclosure control (SDC) techniques to COVID-19 disease data. To assess and limit the risk of re-identification of individual persons in COVID-19 datasets (that are often enriched with other covariates like age, gender, nationality, etc.) to acceptable levels, the risk of successful re-identification by a combination of attribute values must be assessed and controlled. This is done using statistical disclosure control for anonymization of data. Lastly, we discuss the limitations of the proposed techniques and provide general guidelines on using disclosure risks to decide on appropriate modes for data sharing to preserve the privacy of the individuals in the datasets….(More)”.

Mapping Mobility Functional Areas (MFA) using Mobile Positioning Data to Inform COVID-19 Policies


EU Science Hub: “This work introduces the concept of data-driven Mobility Functional Areas (MFAs) as geographic zones with a high degree of intra-mobility exchanges. Such information, calculated at European regional scale thanks to mobile data, can be useful to inform targeted re-escalation policy responses in cases of future COVID-19 outbreaks (avoiding large-area or even national lockdowns). In such events, the geographic distribution of MFAs would define territorial areas to which lockdown interventions could be limited, with the result of minimizing socio-economic consequences of such policies. The analysis of the time evolution of MFAs can also be thought of as a measure of how human mobility changes not only in intensity but also in patterns, providing innovative insights into the impact of mobility containment measures. This work presents a first analysis for 15 European countries (14 EU Member States and Norway)….(More)”.