Medical data has a silo problem. These models could help fix it.


Scott Khan at the WEF: “Every day, more and more data about our health is generated. Data, which if analyzed, could hold the key to unlocking cures for rare diseases, help us manage our health risk factors and provide evidence for public policy decisions. However, due to the highly sensitive nature of health data, much is out of reach to researchers, halting discovery and innovation. The problem is amplified further in the international context when governments naturally want to protect their citizens’ privacy and therefore restrict the movement of health data across international borders. To address this challenge, governments will need to pursue a special approach to policymaking that acknowledges new technology capabilities.

Understanding data siloes

Data becomes siloed for a range of well-considered reasons ranging from restrictions on terms-of-use (e.g., commercial, non-commercial, disease-specific, etc), regulations imposed by governments (e.g., Safe Harbor, privacy, etc.), and an inability to obtain informed consent from historically marginalized populations.

Siloed data, however, also creates a range of problems for researchers looking to make that data useful to the general population. Siloes, for example, block researchers from accessing the most up-to-date information or the most diverse, comprehensive datasets. They can slow the development of new treatments and therefore, curtail key findings that can lead to much needed treatments or cures.

Even when these challenges are overcome, the incidences of data mis-use – where health data is used to explore non-health related topics or without an individual’s consent – continue to erode public trust in the same research institutions that are dependent on such data to advance medical knowledge.

Solving this problem through technology

Technology designed to better protect and decentralize data is being developed to address many of these challenges. Techniques such as homomorphic encryption (a cryptosystem that encrypts data with a public key) and differential privacy (a system leveraging information about a group without revealing details about individuals) both provide means to protect and centralize data while distributing the control of its use to the parties that steward the respective data sets.

Federated data leverages a special type of distributed database management system that can provide an alternative approach to centralizing encoded data without moving the data sets across jurisdictions or between institutions. Such an approach can help connect data sources while accounting for privacy. To further forge trust in the system, a federated model can be implemented to return encoded data to prevent unauthorized distribution of data and learnings as a result of the research activity.

To be sure, within every discussion of the analysis of aggregated data lies challenges with data fusion between data sets, between different studies, between data silos, between institutions. Despite there being several data standards that could be used, most data exist within bespoke data models built for a single purpose rather than for the facilitation of data sharing and data fusion. Furthermore, even when data has been captured into a standardized data model (e.g., the Global Alliance for Genomics and Health offers some models for standardizing sensitive health data), many data sets are still narrowly defined. They often lack any shared identifiers to combine data from different sources into a coherent aggregate data source useful for research. Within a model of data centralization, data fusion can be addressed through data curation of each data set, whereas within a federated model, data fusion is much more vexing….(More)“.

The European data market


European Commission: “It was the first European Data Market study (SMART 2013/0063) contracted by the European Commission in 2013 that made a first attempt to provide facts and figures on the size and trends of the EU data economy by developing a European data market monitoring tool.

The final report of the updated European Data Market (EDM) study (SMART 2016/0063) now presents in detail the results of the final round of measurement of the updated European Data Market Monitoring Tool contracted for the 2017-2020 period.

Designed along a modular structure, as a first pillar of the study, the European Data Market Monitoring Tool is built around a core set of quantitative indicators to provide a series of assessments of the emerging market of data at present, i.e. for the years 2018 through 2020, and with projections to 2025.

The key areas covered by the indicators measured in this report are:

  • The data professionals and the balance between demand and supply of data skills;
  • The data companies and their revenues;
  • The data user companies and their spending for data technologies;
  • The market of digital products and services (“Data market”);
  • The data economy and its impacts on the European economy.
  • Forecast scenarios of all the indicators, based on alternative market trajectories.

Additionally, as a second major work stream, the study also presents a series of descriptive stories providing a complementary view to the one offered by the Monitoring Tool (for example, “How Big Data is driving AI” or “The Secondary Use of Health Data and Data-driven Innovation in the European Healthcare Industry”), adding fresh, real-life information around the quantitative indicators. By focusing on specific issues and aspects of the data market, the stories offer an initial, indicative “catalogue” of good practices of what is happening in the data economy today in Europe and what is likely to affect the development of the EU data economy in the medium term.

Finally, as a third work stream of the study, a landscaping exercise on the EU data ecosystem was carried out together with some community building activities to bring stakeholders together from all segments of the data value chain. The map containing the results of the landscaping of the EU data economy as well as reports from the webinars organised by the study are available on the www.datalandscape.eu website….(More)”.

The National Cancer Institute Cancer Moonshot Public Access and Data Sharing Policy—Initial assessment and implications


Paper by Tammy M. Frisby and Jorge L. Contreras: “Since 2013, federal research-funding agencies have been required to develop and implement broad data sharing policies. Yet agencies today continue to grapple with the mechanisms necessary to enable the sharing of a wide range of data types, from genomic and other -omics data to clinical and pharmacological data to survey and qualitative data. In 2016, the National Cancer Institute (NCI) launched the ambitious $1.8 billion Cancer Moonshot Program, which included a new Public Access and Data Sharing (PADS) Policy applicable to funding applications submitted on or after October 1, 2017. The PADS Policy encourages the immediate public release of published research results and data and requires all Cancer Moonshot grant applicants to submit a PADS plan describing how they will meet these goals. We reviewed the PADS plans submitted with approximately half of all funded Cancer Moonshot grant applications in fiscal year 2018, and found that a majority did not address one or more elements required by the PADS Policy. Many such plans made no reference to the PADS Policy at all, and several referenced obsolete or outdated National Institutes of Health (NIH) policies instead. We believe that these omissions arose from a combination of insufficient education and outreach by NCI concerning its PADS Policy, both to potential grant applicants and among NCI’s program staff and external grant reviewers. We recommend that other research funding agencies heed these findings as they develop and roll out new data sharing policies….(More)”.

The Computermen


Podcast Episode by Jill Lepore: “In 1966, just as the foundations of the Internet were being imagined, the federal government considered building a National Data Center. It would be a centralized federal facility to hold computer records from each federal agency, in the same way that the Library of Congress holds books and the National Archives holds manuscripts. Proponents argued that it would help regulate and compile the vast quantities of data the government was collecting. Quickly, though, fears about privacy, government conspiracies, and government ineptitude buried the idea. But now, that National Data Center looks like a missed opportunity to create rules about data and privacy before the Internet took off. And in the absence of government action, corporations have made those rules themselves….(More)”.

Best Practices to Cover Ad Information Used for Research, Public Health, Law Enforcement & Other Uses


Press Release: “The Network Advertising Initiative (NAI) released privacy Best Practices for its members to follow if they use data collected for Tailored Advertising or Ad Delivery and Reporting for non-marketing purposes, such as sharing with research institutions, public health agencies, or law enforcement entities.

“Ad tech companies have data that can be a powerful resource for the public good if they follow this set of best practices for consumer privacy,” said Leigh Freund, NAI President and CEO. “During the COVID-19 pandemic, we’ve seen the opportunity for substantial public health benefits from sharing aggregate and de-identified location data.”

The NAI Code of Conduct – the industry’s premier self-regulatory framework for privacy, transparency, and consumer choice – covers data collected and used for Tailored Advertising or Ad Delivery and Reporting. The NAI Code has long addressed certain non-marketing uses of data collected for Tailored Advertising and Ad Delivery and Reporting by prohibiting any
eligibility uses of such data, including uses for credit, insurance, healthcare, and employment decisions.

The NAI has always firmly believed that data collected for advertising purposes should not have a negative effect on consumers in their daily lives. However, over the past year, novel data uses have been introduced, especially during the recent health crisis. In the case of opted-in data such as Precise Location Information, a company may determine a user would benefit from more detailed disclosure in a just-in-time notice about non-marketing uses of the data being collected….(More)”.

How Facebook, Twitter and other data troves are revolutionizing social science


Heidi Ledford at Nature: “Elizaveta Sivak spent nearly a decade training as a sociologist. Then, in the middle of a research project, she realized that she needed to head back to school.

Sivak studies families and childhood at the National Research University Higher School of Economics in Moscow. In 2015, she studied the movements of adolescents by asking them in a series of interviews to recount ten places that they had visited in the past five days. A year later, she had analysed the data and was feeling frustrated by the narrowness of relying on individual interviews, when a colleague pointed her to a paper analysing data from the Copenhagen Networks Study, a ground-breaking project that tracked the social-media contacts, demographics and location of about 1,000 students, with five-minute resolution, over five months1. She knew then that her field was about to change. “I realized that these new kinds of data will revolutionize social science forever,” she says. “And I thought that it’s really cool.”

With that, Sivak decided to learn how to program, and join the revolution. Now, she and other computational social scientists are exploring massive and unruly data sets, extracting meaning from society’s digital imprint. They are tracking people’s online activities; exploring digitized books and historical documents; interpreting data from wearable sensors that record a person’s every step and contact; conducting online surveys and experiments that collect millions of data points; and probing databases that are so large that they will yield secrets about society only with the help of sophisticated data analysis.

Over the past decade, researchers have used such techniques to pick apart topics that social scientists have chased for more than a century: from the psychological underpinnings of human morality, to the influence of misinformation, to the factors that make some artists more successful than others. One study uncovered widespread racism in algorithms that inform health-care decisions2; another used mobile-phone data to map impoverished regions in Rwanda3.

“The biggest achievement is a shift in thinking about digital behavioural data as an interesting and useful source”, says Markus Strohmaier, a computational social scientist at the GESIS Leibniz Institute for the Social Sciences in Cologne, Germany.

Not everyone has embraced that shift. Some social scientists are concerned that the computer scientists flooding into the field with ambitions as big as their data sets are not sufficiently familiar with previous research. Another complaint is that some computational researchers look only at patterns and do not consider the causes, or that they draw weighty conclusions from incomplete and messy data — often gained from social-media platforms and other sources that are lacking in data hygiene.

The barbs fly both ways. Some computational social scientists who hail from fields such as physics and engineering argue that many social-science theories are too nebulous or poorly defined to be tested.

This all amounts to “a power struggle within the social-science camp”, says Marc Keuschnigg, an analytical sociologist at Linköping University in Norrköping, Sweden. “Who in the end succeeds will claim the label of the social sciences.”

But the two camps are starting to merge. “The intersection of computational social science with traditional social science is growing,” says Keuschnigg, pointing to the boom in shared journals, conferences and study programmes. “The mutual respect is growing, also.”…(More)”.

Gender gaps in urban mobility


Paper by Laetitia Gauvin, Michele Tizzoni, Simone Piaggesi, Andrew Young, Natalia Adler, Stefaan Verhulst, Leo Ferres & Ciro Cattuto in Humanities and Social Sciences Communications: “Mobile phone data have been extensively used to study urban mobility. However, studies based on gender-disaggregated large-scale data are still lacking, limiting our understanding of gendered aspects of urban mobility and our ability to design policies for gender equality. Here we study urban mobility from a gendered perspective, combining commercial and open datasets for the city of Santiago, Chile.

We analyze call detail records for a large cohort of anonymized mobile phone users and reveal a gender gap in mobility: women visit fewer unique locations than men, and distribute their time less equally among such locations. Mapping this mobility gap over administrative divisions, we observe that a wider gap is associated with lower income and lack of public and private transportation options. Our results uncover a complex interplay between gendered mobility patterns, socio-economic factors and urban affordances, calling for further research and providing insights for policymakers and urban planners….(More)”.

Why local data is the key to successful place making


Blog by Sally Kerr: “The COVID emergency has brought many challenges that were unimaginable a few months ago. The first priorities were safety and health, but when lockdown started one of the early issues was accessing and sharing local data to help everyone deal with and live through the emergency. Communities grappled with the scarcity of local data, finding it difficult to source for some services, food deliveries and goods. This was not a new issue, but the pandemic brought it into sharp relief.

Local data use covers a broad spectrum. People moving to a new area want information about the environment — schools, amenities, transport, crime rates and local health. For residents, continuing knowledge of business opening hours, events, local issues, council plans and roadworks remains important, not only for everyday living but to help understand issues and future plans that will change their environment. Really local data (hyperlocal data) is either fragmented or unavailable, making it difficult for local people to stay informed, whilst larger data sets about an area (e.g. population, school performance) are not always easy to understand or use. They sit in silos owned by different sectors, on disparate websites, usually collated for professional or research use.

Third sector organisations in a community will gather data relevant to their work such as contacts and event numbers but may not source wider data sets about the area, such as demographics, to improve their work. Using this data could strengthen future grant applications by validating their work. For Government or Health bodies carrying out place making community projects, there is a reliance on their own or national data sources supplemented with qualitative data snapshots. Their dependence on tried and tested sources is due to time and resource pressures but means there is no time to gather that rich seam of local data that profiles individual needs.

Imagine a future community where local data is collected and managed together for both official organisations and the community itself. Where there are shared aims and varied use. Current and relevant data would be accessible and easy to understand, provided in formats that suit the user — from data scientist to school child. A curated data hub would help citizens learn data skills and carry out collaborative projects on anything from air quality to local biodiversity, managing the data and offering increased insight and useful validation for wider decision making. Costs would be reduced with duplication and effort reduced….(More)”.

Laying the Foundation for Effective Partnerships: An Examination of Data Sharing Agreements


Paper by Hayden Dahmm: “In the midst of the COVID-19 pandemic, data has never been more salient. COVID has generated new data demands and increased cross-sector data collaboration. Yet, these data collaborations require careful planning and evaluation of risks and opportunities, especially when sharing sensitive data. Data sharing agreements (DSAs) are written agreements that establish the terms for how data are shared between parties and are important for establishing accountability and trust.

However, negotiating DSAs is often time consuming, and collaborators lacking legal or financial capacity are disadvantaged. Contracts for Data Collaboration (C4DC) is a joint initiative between SDSN TReNDS, NYU’s GovLab, the World Economic Forum, and the University of Washington, working to strengthen trust and transparency of data collaboratives. The partners have created an online library of DSAs which represents a selection of data applications and contexts.

This report introduces C4DC and its DSA library. We demonstrate how the library can support the data community to strengthen future data collaborations by showcasing various DSA applications and key considerations. First, we explain our method of analyzing the agreements and consider how six major issues are addressed by different agreements in the library. Key issues discussed include data use, access, breaches, proprietary issues, publicization of the analysis, and deletion of data upon termination of the agreement. For each of these issues, we describe approaches illustrated with examples from the library. While our analysis suggests some pertinent issues are regularly not addressed in DSAs, we have identified common areas of practice that may be helpful for entities negotiating partnership agreements to consider in the future….(More)”.

Sector-Specific (Data-) Access Regimes of Competitors


Paper by Jörg Hoffmann: “The expected economic and social benefits of data access and sharing are enormous. And yet, particularly in the B2B context, data sharing of privately held data between companies has not taken off at efficient scale. This already led to the adoption of sector specific data governance and access regimes. Two of these regimes are enshrined in the PSD2 that introduced an access to account and a data portability rule for specific account information for third party payment providers.

This paper analyses these sector-specific access and portability regimes and identifies regulatory shortcomings that should be addressed and can serve as further guidance for further data access regulation. It first develops regulatory guidelines that build around the multiple regulatory dimensions of data and the potential adverse effects that may be created by too broad data access regimes.

In this regard the paper assesses the role of factual data exclusivity for data driven innovation incentives for undertakings, the role of industrial policy driven market regulation within the principle of a free market economy, the impact of data sharing on consumer sovereignty and choice, and ultimately data induced-distortions of competition. It develops the findings by taking recourse to basic IP and information economics and the EU competition law case law pertaining refusal to supply cases, the rise of ‘surveillance capitalism’ and to current competition policy considerations with regard to the envisioned preventive competition control regime tackling data rich ‘undertakings of paramount importance for competition across markets’ in Germany. This is then followed by an analysis of the PSD2 access and portability regimes in light of the regulatory principles….(More)”.