Responsible data sharing in a big data-driven translational research platform: lessons learned


Paper by S. Kalkman et al: “The sharing of clinical research data is increasingly viewed as a moral duty [1]. Particularly in the context of making clinical trial data widely available, editors of international medical journals have labeled data sharing a highly efficient way to advance scientific knowledge [2,3,4]. The combination of even larger datasets into so-called “Big Data” is considered to offer even greater benefits for science, medicine and society [5]. Several international consortia have now promised to build grand-scale, Big Data-driven translational research platforms to generate better scientific evidence regarding disease etiology, diagnosis, treatment and prognosis across various disease areas [6,7,8].

Despite anticipated benefits, large-scale sharing of health data is charged with ethical questions. Stakeholders have been urged to consider how to manage privacy and confidentiality issues, ensure valid informed consent, and determine who gets to decide about data access [9]. More fundamentally, new data sharing activities prompt questions about social justice and public trust [10]. To balance potential benefits and ethical considerations, data sharing platforms require guidance for the processes of interaction and decision-making. In the European Union (EU), legal norms specified for the sharing of personal data for health research, most notably those set out in the General Data Protection Regulation (GDPR) (EU 2016/679), remain open to interpretation and offer limited practical guidance to researchers [12,12,13]. Striking in this regard is that the GDPR itself stresses the importance of adherence to ethical standards, when broad consent is put forward as a legal basis for the processing of personal data. For example, Recital 33 of the GDPR states that data subjects should be allowed to give “consent to certain areas of scientific research when in keeping with recognised ethical standards for scientific research” [14]. In fact, the GDPR actually encourages data controllers to establish self-regulating mechanisms, such as a code of conduct. To foster responsible and sustainable data sharing in translational research platforms, ethical guidance and governance is therefore necessary. Here, we define governance as ‘the processes of interaction and decision-making among the different stakeholders that are involved in a collective problem that lead to the creation, reinforcement, or reproduction of social norms and institutions’…(More)”.

Biased Algorithms Are Easier to Fix Than Biased People


Sendhil Mullainathan in The New York Times: “In one study published 15 years ago, two people applied for a job. Their résumés were about as similar as two résumés can be. One person was named Jamal, the other Brendan.

In a study published this year, two patients sought medical care. Both were grappling with diabetes and high blood pressure. One patient was black, the other was white.

Both studies documented racial injustice: In the first, the applicant with a black-sounding name got fewer job interviews. In the second, the black patient received worse care.

But they differed in one crucial respect. In the first, hiring managers made biased decisions. In the second, the culprit was a computer program.

As a co-author of both studies, I see them as a lesson in contrasts. Side by side, they show the stark differences between two types of bias: human and algorithmic.

Marianne Bertrand, an economist at the University of Chicago, and I conducted the first study: We responded to actual job listings with fictitious résumés, half of which were randomly assigned a distinctively black name.

The study was: “Are Emily and Greg more employable than Lakisha and Jamal?”

The answer: Yes, and by a lot. Simply having a white name increased callbacks for job interviews by 50 percent.

I published the other study in the journal “Science” in late October with my co-authors: Ziad Obermeyer, a professor of health policy at University of California at Berkeley; Brian Powers, a clinical fellow at Brigham and Women’s Hospital; and Christine Vogeli, a professor of medicine at Harvard Medical School. We focused on an algorithm that is widely used in allocating health care services, and has affected roughly a hundred million people in the United States.

To better target care and provide help, health care systems are turning to voluminous data and elaborately constructed algorithms to identify the sickest patients.

We found these algorithms have a built-in racial bias. At similar levels of sickness, black patients were deemed to be at lower risk than white patients. The magnitude of the distortion was immense: Eliminating the algorithmic bias would more than double the number of black patients who would receive extra help. The problem lay in a subtle engineering choice: to measure “sickness,” they used the most readily available data, health care expenditures. But because society spends less on black patients than equally sick white ones, the algorithm understated the black patients’ true needs.

One difference between these studies is the work needed to uncover bias…(More)”.

Accelerating Medicines Partnership (AMP): Improving Drug Research Efficiency through Biomarker Data Sharing


Data Collaborative Case Study by Michelle Winowatan, Andrew Young, and Stefaan Verhulst: “Accelerating Medicines Partnership (AMP) is a cross-sector data-sharing partnership in the United States between the National Institutes of Health (NIH), the Food and Drug Administration (FDA), multiple biopharmaceutical and life science companies, as well as non-profit organizations that seeks to improve the efficiency of developing new diagnostics and treatments for several types of disease. To achieve this goal, the partnership created a pre-competitive collaborative ecosystem where the biomedical community can pool data and resources that are relevant to the prioritized disease areas. A key component of the partnership is to make biomarkers data available to the medical research community through online portals.

Data Collaboratives Model: Based on our typology of data collaborative models, AMP is an example of the data pooling model of data collaboration, specifically a public data pool. Public data pools co-mingle data assets from multiple data holders — in this case pharmaceutical companies — and make those shared assets available on the web. Pools often limit contributions to approved partners (as public data pools are not crowdsourcing efforts), but access to the shared assets is open, enabling independent re-uses.

Data Stewardship Approach: Data stewardship is built into the partnership through the establishment of an executive committee, which governs the entire partnership, and a steering committee for each disease area, which governs each of the sub-projects within AMP. These committees consist of representatives from the institutional partners involved in AMP and perform data stewards function including enabling inter-institutional engagement as well as intra-institutional coordination, data audit and assessment of value and risk, communication of findings, and nurture the collaboration to sustainability….(Full Case Study)”.

The Role of Crowdsourcing in the Healthcare Industry


Chapter by Kabir C. Sen: “The twenty first century has seen the advent of technical advances in storage, transmission and analysis of information. This has had a profound impact on the field of medicine. However, notwithstanding these advances, various obstacles remain in the world regarding the improvement of human lives through the provision of better health care. The obstacles emanate from the demand (i.e., the problem) as well as the supply (i.e., the solution) side. In some cases, the nature of the problems might not have been correctly identified. In others, a solution to a problem could be known only to a small niche of the global population. Thus, from the demand perspective, the variety of health care issues can range from the quest for a cure for a rare illness to the inability to successfully implement verifiable preventive measures for a disease that affects pockets of the global population. Alternatively, from the supply perspective, the approach to a host of health issues might vary because of fundamental differences in both medical philosophies and organizational policies.

In many instances, effective solutions to health care problems are lacking because of inadequate global knowledge about the particular disease. Alternatively, in other cases, a solution might exist but the relevant knowledge about it might only be available to selected pockets of the global medical community. Sometimes, the barriers to the transfer of knowledge might have their root causes in ignorance or prejudice about the initiator of the cure or solution. However, the advent of information technology has now provided an opportunity for individuals located at different geographical locations to collaborate on solutions to various problems. These crowdsourcing projects now have the potential to extract the “wisdom of crowds” for tackling problems which previously could not be solved by a group of experts (Surowiecki, 2014). Anecdotal evidence suggests that crowdsourcing has achieved some success in providing solutions for a rare medical disease (Arnold, 2014). This chapter discusses crowdsourcing’s potential to solve medical problems by designing a framework to evaluate its promises and suggest recommended future paths of actions….(More)”.

Engaging citizens in determining the appropriate conditions and purposes for re-using Health Data


Beth Noveck at The GovLab: “…The term, big health data, refers to the ability to gather and analyze vast quantities of online information about health, wellness and lifestyle. It includes not only our medical records but data from apps that track what we buy, how often we exercise and how well we sleep, among many other things. It provides an ocean of information about how healthy or ill we are, and unsurprisingly, doctors, medical researchers, healthcare organizations, insurance companies and governments are keen to get access to it. Should they be allowed to?

It’s a huge question, and AARP is partnering with GovLab to learn what older Americans think about it. AARP is a non-profit organization — the largest in the nation and the world — dedicated to empowering Americans to choose how they live as they age. In 2018 it had more than 38 million members. It is a key voice in policymaking in the United States, because it represents the views of people aged over 50 in this country.

From today, AARP and the GovLab are using the Internet to capture what AARP members feel are the most urgent issues confronting them to try to discover what worries people most: the use of big health data or the failure to use it.

The answers are not simple. On the one hand, increasing the use and sharing of data could enable doctors to make better diagnoses and interventions to prevent disease and make us healthier. It could lead medical researchers to find cures faster, while the creation of health data businesses could strengthen the economy.

On the other hand, the collection, sharing, and use of big health data could reveal sensitive personal information over which we have little control. This data could be sold without our consent, and be used by entities for surveillance or discrimination, rather than to promote well-being….(More)”.

How Data Can Help in the Fight Against the Opioid Epidemic in the United States


Report by Joshua New: “The United States is in the midst of an opioid epidemic 20 years in the making….

One of the most pernicious obstacles in the fight against the opioid epidemic is that, until relatively recently, it was difficult to measure the epidemic in any comprehensive capacity beyond such high-level statistics. A lack of granular data and authorities’ inability to use data to inform response efforts allowed the epidemic to grow to devastating proportions. The maxim “you can’t manage what you can’t measure” has never been so relevant, and this failure to effectively leverage data has undoubtedly cost many lives and caused severe social and economic damage to communities ravaged by opioid addiction, with authorities limited in their ability to fight back.

Many factors contributed to the opioid epidemic, including healthcare providers not fully understanding the potential ramifications of prescribing opioids, socioeconomic conditions that make addiction more likely, and drug distributors turning a blind eye to likely criminal behavior, such as pharmacy workers illegally selling opioids on the black market. Data will not be able to solve these problems, but it can make public health officials and other stakeholders more effective at responding to them. Fortunately, recent efforts to better leverage data in the fight against the opioid epidemic have demonstrated the potential for data to be an invaluable and effective tool to inform decision-making and guide response efforts. Policymakers should aggressively pursue more data-driven strategies to combat the opioid epidemic while learning from past mistakes that helped contribute to the epidemic to prevent similar situations in the future.

The scope of this paper is limited to opportunities to better leverage data to help address problems primarily related to the abuse of prescription opioids, rather than the abuse of illicitly manufactured opioids such as heroin and fentanyl. While these issues may overlap, such as when a person develops an opioid use disorder from prescribed opioids and then seeks heroin when they are unable to obtain more from their doctor, the opportunities to address the abuse of prescription opioids are more clear-cut….(More)”.

Unregulated Health Research Using Mobile Devices: Ethical Considerations and Policy Recommendations


Paper by Mark A. Rothstein et al: “Mobile devices with health apps, direct-to-consumer genetic testing, crowd-sourced information, and other data sources have enabled research by new classes of researchers. Independent researchers, citizen scientists, patient-directed researchers, self-experimenters, and others are not covered by federal research regulations because they are not recipients of federal financial assistance or conducting research in anticipation of a submission to the FDA for approval of a new drug or medical device. This article addresses the difficult policy challenge of promoting the welfare and interests of research participants, as well as the public, in the absence of regulatory requirements and without discouraging independent, innovative scientific inquiry. The article recommends a series of measures, including education, consultation, transparency, self-governance, and regulation to strike the appropriate balance….(More)”.

Google’s ‘Project Nightingale’ Gathers Personal Health Data on Millions of Americans


Rob Copeland at Wall Street Journal: “Google is engaged with one of the U.S.’s largest health-care systems on a project to collect and crunch the detailed personal-health information of millions of people across 21 states.

The initiative, code-named “Project Nightingale,” appears to be the biggest effort yet by a Silicon Valley giant to gain a toehold in the health-care industry through the handling of patients’ medical data. Amazon.com Inc., Apple Inc.  and Microsoft Corp. are also aggressively pushing into health care, though they haven’t yet struck deals of this scope.

Google began Project Nightingale in secret last year with St. Louis-based Ascension, a Catholic chain of 2,600 hospitals, doctors’ offices and other facilities, with the data sharing accelerating since summer, according to internal documents.

The data involved in the initiative encompasses lab results, doctor diagnoses and hospitalization records, among other categories, and amounts to a complete health history, including patient names and dates of birth….

Neither patients nor doctors have been notified. At least 150 Google employees already have access to much of the data on tens of millions of patients, according to a person familiar with the matter and the documents.

In a news release issued after The Wall Street Journal reported on Project Nightingale on Monday, the companies said the initiative is compliant with federal health law and includes robust protections for patient data….(More)”.

Big Data, Algorithms and Health Data


Paper by Julia M. Puaschunder: “The most recent decade featured a data revolution in the healthcare sector in screening, monitoring and coordination of aid. Big data analytics have revolutionarized the medical profession. The health sector relys on Artificial Intelligence (AI) and robotics as never before. The opportunities of unprecedented access to healthcare, rational precision and human resemblance but also targeted aid in decentralized aid grids are obvious innovations that will lead to most sophisticated neutral healthcare in the future. Yet big data driven medical care also bears risks of privacy infringements and ethical concerns of social stratification and discrimination. Today’s genetic human screening, constant big data information amalgamation as well as social credit scores pegged to access to healthcare also create the most pressing legal and ethical challenges of our time.Julia M. PuaschunderThe most recent decade featured a data revolution in the healthcare sector in screening, monitoring and coordination of aid. Big data analytics have revolutionarized the medical profession. The health sector relys on Artificial Intelligence (AI) and robotics as never before. The opportunities of unprecedented access to healthcare, rational precision and human resemblance but also targeted aid in decentralized aid grids are obvious innovations that will lead to most sophisticated neutral healthcare in the future. Yet big data driven medical care also bears risks of privacy infringements and ethical concerns of social stratification and discrimination. Today’s genetic human screening, constant big data information amalgamation as well as social credit scores pegged to access to healthcare also create the most pressing legal and ethical challenges of our time.

The call for developing a legal, policy and ethical framework for using AI, big data, robotics and algorithms in healthcare has therefore reached unprecedented momentum. Problematic appear compatibility glitches in the AI-human interaction as well as a natural AI preponderance outperforming humans. Only if the benefits of AI are reaped in a master-slave-like legal frame, the risks associated with these novel superior technologies can be curbed. Liability control but also big data privacy protection appear important to secure the rights of vulnerable patient populations. Big data mapping and social credit scoring must be met with clear anti-discrimination and anti-social stratification ethics. Lastly, the value of genuine human care must be stressed and precious humanness in the artifical age conserved alongside coupling the benefits of AI, robotics and big data with global common goals of sustainability and inclusive growth.

The report aims at helping a broad spectrum of stakeholders understand the impact of AI, big data, algorithms and health data based on information about key opportunities and risks but also future market challenges and policy developments for orchestrating the concerted pursuit of improving healthcare excellence. Stateshuman and diplomates are invited to consider three trends in the wake of the AI (r)evolution:

Artificial Intelligence recently gained citizenship in robots becoming citizens: With attributing quasi-human rights to AI, ethical questions arise of a stratified citizenship. Robots and algorithms may only be citizens for their protection and upholding social norms towards human-like creatures that should be considered slave-like for economic and liability purposes without gaining civil privileges such as voting, property rights and holding public offices.

Big data and computational power imply unprecedented opportunities for: crowd understanding, trends prediction and healthcare control. Risks include data breaches, privacy infringements, stigmatization and discrimination. Big data protection should be enacted through technological advancement, self-determined privacy attention fostered by e-education as well as discrimination alleviation by only releasing targeted information and regulated individual data mining capacities.

The European Union should consider establishing a fifth trade freedom of data by law and economic incentives: in order to bundle AI and big data gains large scale. Europe holds the unique potential of offering data supremacy in state-controlled universal healthcare big data wealth that is less fractionate than the US health landscape and more Western-focused than Asian healthcare. Europe could therefore lead the world on big data derived healthcare insights but should also step up to imbuing humane societal imperatives on these most cutting-edge innovations of our time….(More)”.

Algorithmic futures: The life and death of Google Flu Trends


Vincent Duclos in Medicine Anthropology Theory: “In the last few years, tracking systems that harvest web data to identify trends, calculate predictions, and warn about potential epidemic outbreaks have proliferated. These systems integrate crowdsourced data and digital traces, collecting information from a variety of online sources, and they promise to change the way governments, institutions, and individuals understand and respond to health concerns. This article examines some of the conceptual and practical challenges raised by the online algorithmic tracking of disease by focusing on the case of Google Flu Trends (GFT). Launched in 2008, GFT was Google’s flagship syndromic surveillance system, specializing in ‘real-time’ tracking of outbreaks of influenza. GFT mined massive amounts of data about online search behavior to extract patterns and anticipate the future of viral activity. But it did a poor job, and Google shut the system down in 2015. This paper focuses on GFT’s shortcomings, which were particularly severe during flu epidemics, when GFT struggled to make sense of the unexpected surges in the number of search queries. I suggest two reasons for GFT’s difficulties. First, it failed to keep track of the dynamics of contagion, at once biological and digital, as it affected what I call here the ‘googling crowds’. Search behavior during epidemics in part stems from a sort of viral anxiety not easily amenable to algorithmic anticipation, to the extent that the algorithm’s predictive capacity remains dependent on past data and patterns. Second, I suggest that GFT’s troubles were the result of how it collected data and performed what I call ‘epidemic reality’. GFT’s data became severed from the processes Google aimed to track, and the data took on a life of their own: a trackable life, in which there was little flu left. The story of GFT, I suggest, offers insight into contemporary tensions between the indomitable intensity of collective life and stubborn attempts at its algorithmic formalization.Vincent DuclosIn the last few years, tracking systems that harvest web data to identify trends, calculate predictions, and warn about potential epidemic outbreaks have proliferated. These systems integrate crowdsourced data and digital traces, collecting information from a variety of online sources, and they promise to change the way governments, institutions, and individuals understand and respond to health concerns. This article examines some of the conceptual and practical challenges raised by the online algorithmic tracking of disease by focusing on the case of Google Flu Trends (GFT). Launched in 2008, GFT was Google’s flagship syndromic surveillance system, specializing in ‘real-time’ tracking of outbreaks of influenza. GFT mined massive amounts of data about online search behavior to extract patterns and anticipate the future of viral activity. But it did a poor job, and Google shut the system down in 2015. This paper focuses on GFT’s shortcomings, which were particularly severe during flu epidemics, when GFT struggled to make sense of the unexpected surges in the number of search queries. I suggest two reasons for GFT’s difficulties. First, it failed to keep track of the dynamics of contagion, at once biological and digital, as it affected what I call here the ‘googling crowds’. Search behavior during epidemics in part stems from a sort of viral anxiety not easily amenable to algorithmic anticipation, to the extent that the algorithm’s predictive capacity remains dependent on past data and patterns. Second, I suggest that GFT’s troubles were the result of how it collected data and performed what I call ‘epidemic reality’. GFT’s data became severed from the processes Google aimed to track, and the data took on a life of their own: a trackable life, in which there was little flu left. The story of GFT, I suggest, offers insight into contemporary tensions between the indomitable intensity of collective life and stubborn attempts at its algorithmic formalization….(More)”.