A Taxonomy of Definitions for the Health Data Ecosystem


Announcement: “Healthcare technologies are rapidly evolving, producing new data sources, data types, and data uses, which precipitate more rapid and complex data sharing. Novel technologies—such as artificial intelligence tools and new internet of things (IOT) devices and services—are providing benefits to patients, doctors, and researchers. Data-driven products and services are deepening patients’ and consumers’ engagement and helping to improve health outcomes. Understanding the evolving health data ecosystem presents new challenges for policymakers and industry. There is an increasing need to better understand and document the stakeholders, the emerging data types and their uses.

The Future of Privacy Forum (FPF) and the Information Accountability Foundation (IAF) partnered to form the FPF-IAF Joint Health Initiative in 2018. Today, the Initiative is releasing A Taxonomy of Definitions for the Health Data Ecosystem; the publication is intended to enable a more nuanced, accurate, and common understanding of the current state of the health data ecosystem. The Taxonomy outlines the established and emerging language of the health data ecosystem. The Taxonomy includes definitions of:

  • The stakeholders currently involved in the health data ecosystem and examples of each;
  • The common and emerging data types that are being collected, used, and shared across the health data ecosystem;
  • The purposes for which data types are used in the health data ecosystem; and
  • The types of actions that are now being performed and which we anticipate will be performed on datasets as the ecosystem evolves and expands.

This report is as an educational resource that will enable a deeper understanding of the current landscape of stakeholders and data types….(More)”.

Come to Finland if you want to glimpse the future of health data!


Jukka Vahti at Sitra: “The Finnish tradition of establishing, maintaining and developing data registers goes back to the 1600s, when parish records were first kept.

When this old custom is combined with the opportunities afforded by digitisation, the positive approach Finns have towards research and technology, and the recently updated legislation enabling the data economy, Finland and the Finnish people can lead the way as Europe gradually, or even suddenly, switches to a fair data economy.

The foundations for a fair data economy already exist

The fair data economy is a natural continuation of the former projects promoting e-services that were undertaken in Finland.

For example, the Data Exchange Layer is already speeding up the transfer of data from one system to another in Finland and in Estonia, the country where the system originated, and a system unique to just these two countries.

In May 2019 Finland also saw the entry into force of the Act on the Secondary Use of Health and Social Data, according to which the information on social welfare and healthcare held in registers may be used for purposes of statistics, research, education, knowledge management, control and supervision conducted by authorities, and development and innovation activity.

The new law will make the work of researchers and service developers more effective, as the business of acquiring a permit will take place through a one-stop-shop principle and it will be possible to use data from more than one source more readily than before….(More)”.

What can we learn from billions of food purchases derived from fidelity cards?


Daniele Quercia at Medium: “By combining 1.6B food item purchases with 1.1B medical prescriptions for the entire city of London for one year, we discovered that, to predict health outcomes, socio-economic conditions matter less than what previous research has shown: despite being of lower-income, certain areas are healthy, and that is because of what their residents eat!

This result comes from our latest project “Poor but Healthy”, which was published in the Springer European Physical Journal (EPJ) of Data Science this month, and comes with a @tobi_vierzwo’s stunningly beautiful map of London I invite all of you to explore.

Why are we interested in urban health? In our cities, food is cheap and exercise discretionary, and health takes its toll. Half of European citizens will be obese by 2050, and obesity and its diseases are likely to reach crisis proportions. In this project, we set out to show that fidelity cards of grocery stores represent a treasure trove of health data — they can be used not only to (e)mail discount coupons to customers but also to effectively track a neighbourhood’s health in real-time for an entire city or even an entire country.

In research circles, the impact of eating habits on people’s health has mostly been studied using dietary surveys, which are costly and of limited scale.

To complement these surveys, we have recently resorted to grocery fidelity cards. We analyzed the anonymized records of 1.6B grocery items purchased by 1.6M grocery store customers in London over one whole year, and combined them with 1.1B medical prescriptions.

In so doing, we found that, as one expects, the “trick” to not being associated with chronic diseases is eating less what we instinctively like (e.g., sugar, carbohydrates), balancing all the nutrients, and avoiding the (big) quantities that are readily available. These results come as no surprise yet speak to the validity of using fidelity cards to capture health outcomes…(More)”.


Facebook releases a trio of maps to aid with fighting disease outbreaks


Sarah Perez at Techcrunch: “Facebook… announced a new initiative focused on using its data and technologies to help nonprofit organizations and universities working in public health better map the spread of infectious diseases around the world. Specifically, the company is introducing three new maps: population density maps with demographic estimates, movement maps and network coverage maps. These, says Facebook, will help the health partners to understand where people live, how they’re moving and if they have connectivity — all factors that can aid in determining how to respond to outbreaks, and where supplies should be delivered.

As Facebook explained, health organizations rely on information like this when planning public health campaigns. But much of the information they rely on is outdated, like older census data. In addition, information from more remote communities can be scarce.

By combining the new maps with other public health data, Facebook believes organizations will be better equipped to address epidemics.

The new high-resolution population density maps will estimate the number of people living within 30-meter grid tiles, and provide insights on demographics, including the number of children under five, the number of women of reproductive age, as well as young and elderly populations. These maps aren’t built using Facebook data, but are instead built by using Facebook’s AI capabilities with satellite imagery and census information.

Movement maps, meanwhile, track aggregate data about Facebook users’ movements via their mobile phones (when location services are enabled). At scale, health partners can combine this with other data to predict where other outbreaks may occur next….(More)”.

Crowdsourcing Research Questions? Leveraging the Crowd’s Experiential Knowledge for Problem Finding


Paper by Tiare-Maria Brasseur, Susanne Beck, Henry Sauermann, Marion Poetz: “Recently, both researchers and policy makers have become increasingly interested in involving the general public (i.e., the crowd) in the discovery of new science-based knowledge. There has been a boom of citizen science/crowd science projects (e.g., Foldit or Galaxy Zoo) and global policy aspirations for greater public engagement in science (e.g., Horizon Europe). At the same time, however, there are also criticisms or doubts about this approach. Science is complex and laypeople often do not have the appropriate knowledge base for scientific judgments, so they rely on specialized experts (i.e., scientists) (Scharrer, Rupieper, Stadtler & Bromme, 2017). Given these two perspectives, there is no consensus on what the crowd can do and what only researchers should do in scientific processes yet (Franzoni & Sauermann, 2014). Previous research demonstrates that crowds can be efficiently and effectively used in late stages of the scientific research process (i.e., data collection and analysis). We are interested in finding out what crowds can actually contribute to research processes that goes beyond data collection and analysis. Specifically, this paper aims at providing first empirical insights on how to leverage not only the sheer number of crowd contributors, but also their diversity in experience for early phases of the research process (i.e., problem finding). In an online and field experiment, we develop and test suitable mechanisms for facilitating the transfer of the crowd’s experience into scientific research questions. In doing so, we address the following two research questions: 1. What factors influence crowd contributors’ ability to generate research questions? 2. How do research questions generated by crowd members differ from research questions generated by scientists in terms of quality? There are strong claims about the significant potential of people with experiential knowledge, i.e., sticky problem knowledge derived from one’s own practical experience and practices (Collins & Evans, 2002), to enhance the novelty and relevance of scientific research (e.g., Pols, 2014). Previous evidence that crowds with experiential knowledge (e.g., users in Poetz & Schreier, 2012) or ?outsiders?/nonobvious individuals (Jeppesen & Lakhani, 2010) can outperform experts under certain conditions by having novel perspectives, support the assumption that the participation of non-scientists (i.e., crowd members) in scientific problem-finding might complement scientists’ lack of experiential knowledge. Furthermore, by bringing in exactly these new perspectives, they might help overcome problems of fixation/inflexibility in cognitive-search processes among scientists (Acar & van den Ende, 2016). Thus, crowd members with (higher levels of) experiential knowledge are expected to be superior in identifying very novel and out-of-the-box research problems with high practical relevance, as compared to scientists. However, there are clear reasons to be skeptical: despite their advantage to possess important experiential knowledge, the crowd lacks the scientific knowledge we assume to be required to formulate meaningful research questions. To study exactly how the transfer of crowd members’ experiential knowledge into science can be facilitated, we conducted two experimental studies in context of traumatology (i.e., research on accidental injuries). First, we conducted a large-scale online experiment (N=704) in collaboration with an international crowdsourcing platform to test the effect of two facilitating treatments on crowd members’ ability to formulate real research questions (study 1). We used a 2 (structuring knowledge/no structuring knowledge) x 2 (science knowledge/no science knowledge) between-subject experimental design. Second, we tested the same treatments in the field (study 2), i.e., in a crowdsourcing project in collaboration with LBG Open Innovation in Science Center. We invited patients, care takers and medical professionals (e.g., surgeons, physical therapists or nurses) concerned with accidental injuries to submit research questions using a customized online platform (https://tell-us.online/) to investigate the causal relationship between our treatments and different types and levels of experiential knowledge (N=118). An international jury of experts (i.e., journal editors in the field of traumatology) then assesses the quality of submitted questions (from the online and field experiment) along several quality dimensions (i.e., clarity, novelty, scientific impact, practical impact, feasibility) in an online evaluation process. To assess the net effect of our treatments, we further include a random sample of research questions obtained from early-stage research papers (i.e., conference papers) into the expert evaluation (blind to the source) and compare them with the baseline groups of our experiments. We are currently finalizing the data collection…(More)”.

Humans and Big Data: New Hope? Harnessing the Power of Person-Centred Data Analytics


Paper by Carmel Martin, Keith Stockman and Joachim P. Sturmberg: “Big data provide the hope of major health innovation and improvement. However, there is a risk of precision medicine based on predictive biometrics and service metrics overwhelming anticipatory human centered sense-making, in the fuzzy emergence of personalized (big data) medicine. This is a pressing issue, given the paucity of individual sense-making data approaches. A human-centric model is described to address the gap in personal particulars and experiences in individual health journeys. The Patient Journey Record System (PaJR) was developed to improve human-centric healthcare by harnessing the power of person-centred data analytics using complexity theory, iterative health services and information systems applications over a 10 year period. PaJR is a web-based service supporting usually bi-weekly telephone calls by care guides to individuals at risk of readmissions.

This chapter describes a case study of the timing and context of readmissions using human (biopsychosocial) particular data which is based on individual experiences and perceptions with differing patterns of instability. This Australian study, called MonashWatch, is a service pilot using the PaJR system in the Dandenong Hospital urban catchment area of the Monash Health network. State public hospital big data – the Victorian HealthLinks Chronic Care algorithm provides case finding for high risk of readmission based on disease and service metrics. Monash Watch was actively monitoring 272 of 376 intervention patients, with 195 controls over 22 months (ongoing) at the time of the study.

Three randomly selected intervention cases describe a dynamic interplay of self-reported change in health and health care, medication, drug and alcohol use, social support structure. While the three cases were at similar predicted risk initially, their cases represented different statistically different time series configurations and admission patterns. Fluctuations in admission were associated with (mal)alignment of bodily health with psychosocial and environmental influences. However human interpretation was required to make sense of the patterns as presented by the multiple levels of data.

A human-centric model and framework for health journey monitoring illustrates the potential for ‘small’ personal experience data to inform clinical care in the era of big data predominantly based on biometrics and medical industrial process. ….(More)”.

Data Trusts, Health Data, and the Professionalization of Data Management


Paper by Keith Porcaro: “This paper explores how trusts can provide a legal model for professionalizing health data management. Data is potential. Over time, data collected for one purpose can support others. Clinical records at a hospital, created to manage a patient’s care, can be internally analyzed to identify opportunities for process and safety improvements at a hospital, or externally analyzed with other records to identify optimal treatment patterns. Data also carries the potential for harm. Personal data can be leaked or exposed. Proprietary models can be used to discriminate against patients, or price them out of care.

As novel uses of data proliferate, an individual data holder may be ill-equipped to manage complex new data relationships in a way that maximizes value and minimizes harm. A single organization may be limited by management capacity or risk tolerance. Organizations across sectors have digitized unevenly or late, and may not have mature data controls and policies. Collaborations that involve multiple organizations may face coordination problems, or disputes over ownership.

Data management is still a relatively young field. Most models of external data-sharing are based on literally transferring data—copying data between organizations, or pooling large datasets together under the control of a third party—rather than facilitating external queries of a closely held dataset.

Few models to date have focused on the professional management of data on behalf of a data holder, where the data holder retains control over not only their data, but the inferences derived from their data. Trusts can help facilitate the professionalization of data management. Inspired by the popularity of trusts for managing financial investments, this paper argues that data trusts are well-suited as a vehicle for open-ended professional management of data, where a manager’s discretion is constrained by fiduciary duties and a trust document that defines the data holder’s goals…(More)”.

How AI could save lives without spilling medical secrets


Will Knight at MIT Technology Review: “The potential for artificial intelligence to transform health care is huge, but there’s a big catch.

AI algorithms will need vast amounts of medical data on which to train before machine learning can deliver powerful new ways to spot and understand the cause of disease. That means imagery, genomic information, or electronic health records—all potentially very sensitive information.

That’s why researchers are working on ways to let AI learn from large amounts of medical data while making it very hard for that data to leak.

One promising approach is now getting its first big test at Stanford Medical School in California. Patients there can choose to contribute their medical data to an AI system that can be trained to diagnose eye disease without ever actually accessing their personal details.

Participants submit ophthalmology test results and health record data through an app. The information is used to train a machine-learning model to identify signs of eye disease in the images. But the data is protected by technology developed by Oasis Labs, a startup spun out of UC Berkeley, which guarantees that the information cannot be leaked or misused. The startup was granted permission by regulators to start the trial last week.

The sensitivity of private patient data is a looming problem. AI algorithms trained on data from different hospitals could potentially diagnose illness, prevent disease, and extend lives. But in many countries medical records cannot easily be shared and fed to these algorithms for legal reasons. Research on using AI to spot disease in medical images or data usually involves relatively small data sets, which greatly limits the technology’s promise….

Oasis stores the private patient data on a secure chip, designed in collaboration with other researchers at Berkeley. The data remains within the Oasis cloud; outsiders are able to run algorithms on the data, and receive the results, without its ever leaving the system. A smart contractsoftware that runs on top of a blockchain—is triggered when a request to access the data is received. This software logs how the data was used and also checks to make sure the machine-learning computation was carried out correctly….(More)”.

Missing Numbers


Introduction by Anna Powell-Smith of a new “blog on the data the government should collect, but doesn’t”: “…Over time, I started to notice a pattern. Across lots of different policy areas, it was impossible for governments to make good decisions because of a basic lack of data. There was always critical data that the state either didn’t collect at all, or collected so badly that it made change impossible.

Eventually, I decided that the power to not collect data is one of the most important and little-understood sources of power that governments have. This is why I’m writing Missing Numbers: to encourage others to ask “is this lack of data a deliberate ploy to get away with something”?

By refusing to amass knowledge in the first place, decision-makers exert power over over the rest of us. It’s time that this power was revealed, so we can have better conversations about what we need to know to run this country successfully.

A typical example

The government records and publishes data on how often each NHS hospital receives formal complaints. This is very helpful, because it means patients and the people who care for them can spot hospitals whose performance is worrying.

But the government simply doesn’t record data, even internally, on how often formal complaints are made about each Jobcentre. (That FOI response is from 2015, but I’ve confirmed it’s still true in 2019.) So it is impossible for it to know if some Jobcentres are being seriously mismanaged….(More)”.

San Francisco teams up with Uber, location tracker on 911 call responses


Gwendolyn Wu at San Francisco Chronicle: “In an effort to shorten emergency response times in San Francisco, the city announced on Monday that it is now using location data from RapidSOS, a New York-based public safety tech company, and ride-hailing company Uber to improve location coordinates generated from 911 calls.

An increasing amount of emergency calls are made from cell phones, said Michelle Cahn, RapidSOS’s director of community engagement. The new technology should allow emergency responders to narrow down the location of such callers and replace existing 911 technology that was built for landlines and tied to home addresses.

Cell phone location data currently given to dispatchers when they receive a 911 call can be vague, especially if the person can’t articulate their exact location, according to the Department of Emergency Management.

But if a dispatcher can narrow down where the emergency is happening, that increases the chance of a timely response and better result, Cahn said.

“It doesn’t matter what’s going on with the emergency if we don’t know where it is,” she said.

RapidSOS shares its location data — collected by Apple and Google for their in-house map apps — free of charge to public safety agencies. San Francisco’s 911 call center adopted the data service in September 2018.

The Federal Communications Commission estimates agencies could save as many as 10,000 lives a year if they shave a minute off response times. Federal officials issued new rules to improve wireless 911 calls in 2015, asking mobile carriers to provide more accurate locations to call centers. Carriers are required to find a way to triangulate the caller’s location within 50 meters — a much smaller radius than the eight blocks city officials were initially presented in October when the caller dialed 911…(More)”.