International Development Doesn’t Care About Patient Privacy


Yogesh Rajkotia at the Stanford Social Innovation Review: “In 2013, in southern Mozambique, foreign NGO workers searched for a man whom the local health facility reported as diagnosed with HIV. The workers aimed to verify that the health facility did indeed diagnose and treat him. When they could not find him, they asked the village chief for help. Together with an ever-growing crowd of onlookers, the chief led them to the man’s home. After hesitating and denying, he eventually admitted, in front of the crowd, that he had tested positive and received treatment. With his status made public, he now risked facing stigma, discrimination, and social marginalization. The incident undermined both his health and his ability to live a dignified life.

Similar privacy violations were documented in Burkina Faso in 2016, where community workers asked partners, in the presence of each other, to disclose what individual health services they had obtained.

Why was there such a disregard for the privacy and dignity of these citizens?

As it turns out, unbeknownst to these Mozambican and Burkinabé patients, their local health centers were participating in performance-based financing (PBF) programs financed by foreign assistance agencies. Implemented in more than 35 countries, PBF programs offer health workers financial bonuses for delivering priority health interventions. To ensure that providers do not cheat the system, PBF programs often send verifiers to visit patients’ homes to confirm that they have received specific health services. These verifiers are frequently community members (the World Bank callously notes in its “Performance-Based Financing Toolkit” that even “a local soccer club” can play this role), and this practice, known as “patient tracing,” is common among PBF programs. In World Bank-funded PBF programs alone, 19 out of the 25 PBF programs implement patient tracing. Yet the World Bank’s toolkit never mentions patient privacy or confidentiality. In patient tracing, patients’ rights and dignity are secondary to donor objectives.

Patient tracing within PBF programs is just one example of a bigger problem: Privacy violations are pervasive in global health. Some researchers and policymakers have raised privacy concerns about tuberculosis (TB), human immunodeficiency virus (HIV), family planningpost-abortion care, and disease surveillance programsA study conducted by the Asia-Pacific Network of People Living with HIV/AIDS found that 34 percent of people living with HIV in India, Indonesia, Philippines, and Thailand reported that health workers breached confidentiality. In many programs, sensitive information about people’s sexual and reproductive health, disease status, and other intimate health details are often collected to improve health system effectiveness and efficiency. Usually, households have no way to opt out, nor any control over how heath care programs use, store, and disseminate this data. At the same time, most programs do not have systems to enforce health workers’ non-disclosure of private information.

In societies with strong stigma around certain health topics—especially sexual and reproductive health—the disclosure of confidential patient information can destroy lives. In contexts where HIV is highly stigmatized, people living with HIV are 2.4 times more likely to delay seeking care until they are seriously ill. In addition to stigma’s harmful effects on people’s health, it can limit individuals’ economic opportunities, cause them to be socially marginalized, and erode their psychological wellbeing….(More)”.

Is Distributed Ledger Technology Built for Personal Data?


Paper by Henry Chang: “Some of the appealing characteristics of distributed ledger technology (DLT), which blockchain is a type of, include guaranteed integrity, disintermediation and distributed resilience. These characteristics give rise to the possible consequences of immutability, unclear ownership, universal accessibility and trans-border storage. These consequences have the potential to contravene data protection principles of Purpose Specification, Use Limitation, Data Quality, Individual Participation and Trans-Border Data Flow. This paper endeavors to clarify the various types of DLTs, how they work, why they exhibit the depicted characteristics and the consequences. Using the universal privacy principles developed by the Organisation of Economic Cooperation and Development (OECD), this paper then describes how each of the consequence causes concerns for privacy protection and how attempts are being made to address them in the design and implementation of various applications of blockchain and DLT, and indicates where further research and best-practice developments lie….(More)”.

Making Better Use of Health Care Data


Benson S. Hsu, MD and Emily Griese in Harvard Business Review: “At Sanford Health, a $4.5 billion rural integrated health care system, we deliver care to over 2.5 million people in 300 communities across 250,000 square miles. In the process, we collect and store vast quantities of patient data – everything from admission, diagnostic, treatment and discharge data to online interactions between patients and providers, as well as data on providers themselves. All this data clearly represents a rich resource with the potential to improve care, but until recently was underutilized. The question was, how best to leverage it.

While we have a mature data infrastructure including a centralized data and analytics team, a standalone virtual data warehouse linking all data silos, and strict enterprise-wide data governance, we reasoned that the best way forward would be to collaborate with other institutions that had additional and complementary data capabilities and expertise.

We reached out to potential academic partners who were leading the way in data science, from university departments of math, science, and computer informatics to business and medical schools and invited them to collaborate with us on projects that could improve health care quality and lower costs. In exchange, Sanford created contracts that gave these partners access to data whose use had previously been constrained by concerns about data privacy and competitive-use agreements. With this access, academic partners are advancing their own research while providing real-world insights into care delivery.

The resulting Sanford Data Collaborative, now in its second year, has attracted regional and national partners and is already beginning to deliver data-driven innovations that are improving care delivery, patient engagement, and care access. Here we describe three that hold particular promise.

  • Developing Prescriptive Algorithms…
  • Augmenting Patient Engagement…
  • Improving Access to Care…(More)”.

Infection forecasts powered by big data


Michael Eisenstein at Nature: “…The good news is that the present era of widespread access to the Internet and digital health has created a rich reservoir of valuable data for researchers to dive into….By harvesting and combining these streams of big data with conventional ways of monitoring infectious diseases, the public-health community could gain fresh powers to catch and curb emerging outbreaks before they rage out of control.

Going viral

Data scientists at Google were the first to make a major splash using data gathered online to track infectious diseases. The Google Flu Trends algorithm, launched in November 2008, combed through hundreds of billions of users’ queries on the popular search engine to look for small increases in flu-related terms such as symptoms or vaccine availability. Initial data suggested that Google Flu Trends could accurately map the incidence of flu with a lag of roughly one day. “It was a very exciting use of these data for the purpose of public health,” says Brownstein. “It really did start a whole revolution and new field of work in query data.”

Unfortunately, Google Flu Trends faltered when it mattered the most, completely missing the onset in April 2009 of the H1N1 pandemic. The algorithm also ran into trouble later on in the pandemic. It had been trained against seasonal fluctuations of flu, says Viboud, but people’s behaviour changed in the wake of panic fuelled by media reports — and that threw off Google’s data. …

Nevertheless, its work with Internet usage data was inspirational for infectious-disease researchers. A subsequent study from a team led by Cecilia Marques-Toledo at the Federal University of Minas Gerais in Belo Horizonte, Brazil, used Twitter to get high-resolution data on the spread of dengue fever in the country. The researchers could quickly map new cases to specific cities and even predict where the disease might spread to next (C. A. Marques-Toledo et al. PLoS Negl. Trop. Dis. 11, e0005729; 2017). Similarly, Brownstein and his colleagues were able to use search data from Google and Twitter to project the spread of Zika virus in Latin America several weeks before formal outbreak declarations were made by public-health officials. Both Internet services are used widely, which makes them data-rich resources. But they are also proprietary systems for which access to data is controlled by a third party; for that reason, Generous and his colleagues have opted instead to make use of search data from Wikipedia, which is open source. “You can get the access logs, and how many people are viewing articles, which serves as a pretty good proxy for search interest,” he says.

However, the problems that sank Google Flu Trends still exist….Additionally, online activity differs for infectious conditions with a social stigma such as syphilis or AIDS, because people who are or might be affected are more likely to be concerned about privacy. Appropriate search-term selection is essential: Generous notes that initial attempts to track flu on Twitter were confounded by irrelevant tweets about ‘Bieber fever’ — a decidedly non-fatal condition affecting fans of Canadian pop star Justin Bieber.

Alternatively, researchers can go straight to the source — by using smartphone apps to ask people directly about their health. Brownstein’s team has partnered with the Skoll Global Threats Fund to develop an app called Flu Near You, through which users can voluntarily report symptoms of infection and other information. “You get more detailed demographics about age and gender and vaccination status — things that you can’t get from other sources,” says Brownstein. Ten European Union member states are involved in a similar surveillance programme known as Influenzanet, which has generally maintained 30,000–40,000 active users for seven consecutive flu seasons. These voluntary reporting systems are particularly useful for diseases such as flu, for which many people do not bother going to the doctor — although it can be hard to persuade people to participate for no immediate benefit, says Brownstein. “But we still get a good signal from the people that are willing to be a part of this.”…(More)”.

No One Owns Data


Paper by Lothar Determann: “Businesses, policy makers, and scholars are calling for property rights in data. They currently focus particularly on the vast amounts of data generated by connected cars, industrial machines, artificial intelligence, toys and other devices on the Internet of Things (IoT). This data is personal to numerous parties who are associated with a connected device, for example, the driver of a connected car, its owner and passengers, as well as other traffic participants. Manufacturers, dealers, independent providers of auto parts and services, insurance companies, law enforcement agencies and many others are also interested in this data. Various parties are actively staking their claims to data on the Internet of Things, as they are mining data, the fuel of the digital economy.

Stakeholders in digital markets often frame claims, negotiations and controversies regarding data access as one of ownership. Businesses regularly assert and demand that they own data. Individual data subjects also assume that they own data about themselves. Policy makers and scholars focus on how to redistribute ownership rights to data. Yet, upon closer review, it is very questionable whether data is—or should be—subject to any property rights. This article unambiguously answers the question in the negative, both with respect to existing law and future lawmaking, in the United States as in the European Union, jurisdictions with notably divergent attitudes to privacy, property and individual freedoms….

The article begins with a brief review of the current landscape of the Internet of Things notes explosive growth of data pools generated by connected devices, artificial intelligence, big data analytics tools and other information technologies. Part 1 lays the foundation for examining concrete current legal and policy challenges in the remainder of the article. Part 2 supplies conceptual differentiation and definitions with respect to “data” and “information” as the subject of rights and interests. Distinctions and definitional clarity serve as the basis for examining the purposes and reach of existing property laws in Part 3, including real property, personal property and intellectual property laws. Part 4 analyzes the effect of data-related laws that do not grant property rights. Part 5 examines how the interests of the various stakeholders are protected or impaired by the current framework of data-related laws to identify potential gaps that could warrant additional property rights. Part 6 examines policy considerations for and against property rights in data. Part 7 concludes that no one owns data and no one should own data….(More)”.

Online Political Microtargeting: Promises and Threats for Democracy


Frederik Zuiderveen Borgesius et al in Utrecht Law Review: “Online political microtargeting involves monitoring people’s online behaviour, and using the collected data, sometimes enriched with other data, to show people-targeted political advertisements. Online political microtargeting is widely used in the US; Europe may not be far behind.

This paper maps microtargeting’s promises and threats to democracy. For example, microtargeting promises to optimise the match between the electorate’s concerns and political campaigns, and to boost campaign engagement and political participation. But online microtargeting could also threaten democracy. For instance, a political party could, misleadingly, present itself as a different one-issue party to different individuals. And data collection for microtargeting raises privacy concerns. We sketch possibilities for policymakers if they seek to regulate online political microtargeting. We discuss which measures would be possible, while complying with the right to freedom of expression under the European Convention on Human Rights….(More)”.

Data Collaboratives can transform the way civil society organisations find solutions


Stefaan G. Verhulst at Disrupt & Innovate: “The need for innovation is clear: The twenty-first century is shaping up to be one of the most challenging in recent history. From climate change to income inequality to geopolitical upheaval and terrorism: the difficulties confronting International Civil Society Organisations (ICSOs) are unprecedented not only in their variety but also in their complexity. At the same time, today’s practices and tools used by ICSOs seem stale and outdated. Increasingly, it is clear, we need not only new solutions but new methods for arriving at solutions.

Data will likely become more central to meeting these challenges. We live in a quantified era. It is estimated that 90% of the world’s data was generated in just the last two years. We know that this data can help us understand the world in new ways and help us meet the challenges mentioned above. However, we need new data collaboration methods to help us extract the insights from that data.

UNTAPPED DATA POTENTIAL

For all of data’s potential to address public challenges, the truth remains that most data generated today is in fact collected by the private sector – including ICSOs who are often collecting a vast amount of data – such as, for instance, the International Committee of the Red Cross, which generates various (often sensitive) data related to humanitarian activities. This data, typically ensconced in tightly held databases toward maintaining competitive advantage or protecting from harmful intrusion, contains tremendous possible insights and avenues for innovation in how we solve public problems. But because of access restrictions and often limited data science capacity, its vast potential often goes untapped.

DATA COLLABORATIVES AS A SOLUTION

Data Collaboratives offer a way around this limitation. They represent an emerging public-private partnership model, in which participants from different areas — including the private sector, government, and civil society — come together to exchange data and pool analytical expertise.

While still an emerging practice, examples of such partnerships now exist around the world, across sectors and public policy domains. Importantly several ICSOs have started to collaborate with others around their own data and that of the private and public sector. For example:

  • Several civil society organisations, academics, and donor agencies are partnering in the Health Data Collaborative to improve the global data infrastructure necessary to make smarter global and local health decisions and to track progress against the Sustainable Development Goals (SDGs).
  • Additionally, the UN Office for the Coordination of Humanitarian Affairs (UNOCHA) built Humanitarian Data Exchange (HDX), a platform for sharing humanitarian from and for ICSOs – including Caritas, InterAction and others – donor agencies, national and international bodies, and other humanitarian organisations.

These are a few examples of Data Collaboratives that ICSOs are participating in. Yet, the potential for collaboration goes beyond these examples. Likewise, so do the concerns regarding data protection and privacy….(More)”.

The future of statistics and data science


Paper by Sofia C. Olhede and Patrick J. Wolfe in Statistics & Probability Letters: “The Danish physicist Niels Bohr is said to have remarked: “Prediction is very difficult, especially about the future”. Predicting the future of statistics in the era of big data is not so very different from prediction about anything else. Ever since we started to collect data to predict cycles of the moon, seasons, and hence future agriculture yields, humankind has worked to infer information from indirect observations for the purpose of making predictions.

Even while acknowledging the momentous difficulty in making predictions about the future, a few topics stand out clearly as lying at the current and future intersection of statistics and data science. Not all of these topics are of a strictly technical nature, but all have technical repercussions for our field. How might these repercussions shape the still relatively young field of statistics? And what can sound statistical theory and methods bring to our understanding of the foundations of data science? In this article we discuss these issues and explore how new open questions motivated by data science may in turn necessitate new statistical theory and methods now and in the future.

Together, the ubiquity of sensing devices, the low cost of data storage, and the commoditization of computing have led to a volume and variety of modern data sets that would have been unthinkable even a decade ago. We see four important implications for statistics.

First, many modern data sets are related in some way to human behavior. Data might have been collected by interacting with human beings, or personal or private information traceable back to a given set of individuals might have been handled at some stage. Mathematical or theoretical statistics traditionally does not concern itself with the finer points of human behavior, and indeed many of us have only had limited training in the rules and regulations that pertain to data derived from human subjects. Yet inevitably in a data-rich world, our technical developments cannot be divorced from the types of data sets we can collect and analyze, and how we can handle and store them.

Second, the importance of data to our economies and civil societies means that the future of regulation will look not only to protect our privacy, and how we store information about ourselves, but also to include what we are allowed to do with that data. For example, as we collect high-dimensional vectors about many family units across time and space in a given region or country, privacy will be limited by that high-dimensional space, but our wish to control what we do with data will go beyond that….

Third, the growing complexity of algorithms is matched by an increasing variety and complexity of data. Data sets now come in a variety of forms that can be highly unstructured, including images, text, sound, and various other new forms. These different types of observations have to be understood together, resulting in multimodal data, in which a single phenomenon or event is observed through different types of measurement devices. Rather than having one phenomenon corresponding to single scalar values, a much more complex object is typically recorded. This could be a three-dimensional shape, for example in medical imaging, or multiple types of recordings such as functional magnetic resonance imaging and simultaneous electroencephalography in neuroscience. Data science therefore challenges us to describe these more complex structures, modeling them in terms of their intrinsic patterns.

Finally, the types of data sets we now face are far from satisfying the classical statistical assumptions of identically distributed and independent observations. Observations are often “found” or repurposed from other sampling mechanisms, rather than necessarily resulting from designed experiments….

 Our field will either meet these challenges and become increasingly ubiquitous, or risk rapidly becoming irrelevant to the future of data science and artificial intelligence….(More)”.

Spanning Today’s Chasms: Seven Steps to Building Trusted Data Intermediaries


James Shulman at the Mellon Foundation: “In 2001, when hundreds of individual colleges and universities were scrambling to scan their slide libraries, The Andrew W. Mellon Foundation created a new organization, Artstor, to assemble a massive library of digital images from disparate sources to support teaching and research in the arts and humanities.

Rather than encouraging—or paying for—each school to scan its own slide of the Mona Lisa, the Mellon Foundation created an intermediary organization that would balance the interests of those who created, photographed and cared for art works, such as artists and museums, and those who wanted to use such images for the admirable calling of teaching and studying history and culture.  This organization would reach across the gap that separated these two communities and would respect and balance the interests of both sides, while helping each accomplish their missions.  At the same time that Napster was using technology to facilitate the un-balanced transfer of digital content from creators to users, the Mellon Foundation set up a new institution aimed at respecting the interests of one side of the market and supporting the socially desirable work of the other.

As the internet has enabled the sharing of data across the world, new intermediaries have emerged as entire platforms. A networked world needs such bridges—think Etsy or Ebay sitting between sellers and buyers, or Facebook sitting between advertisers and users. While intermediaries that match sellers and buyers of things provide a marketplace to bridge from one side or the other, aggregators of data work in admittedly more shadowy territories.

In the many realms that market forces won’t support, however, a great deal of public good can be done by aggregating and managing access to datasets that might otherwise continue to live in isolation. Whether due to institutional sociology that favors local solutions, the technical challenges associated with merging heterogeneous databases built with different data models, intellectual property limitations, or privacy concerns, datasets are built and maintained by independent groups that—if networked—could be used to further each other’s work.

Think of those studying coral reefs, or those studying labor practices in developing markets, or child welfare offices seeking to call upon court records in different states, or medical researchers working in different sub-disciplines but on essentially the same disease.  What intermediary invests in joining these datasets?  Many people assume that computers can simply “talk” to each other and share data intuitively, but without targeted investment in connecting them, they can’t.  Unlike modern databases that are now often designed with the cloud in mind, decades of locally created databases churn away in isolation, at great opportunity cost to us all.

Art history research is an unusually vivid example. Most people can understand that if you want to study Caravaggio, you don’t want to hunt and peck across hundreds of museums, books, photo archives, libraries, churches, and private collections.  You want all that content in one place—exactly what Mellon sought to achieve by creating Artstor.

What did we learn in creating Artstor that might be distilled as lessons for others taking on an aggregation project to serve the public good?….(More)”.

Facebook’s next project: American inequality


Nancy Scola at Politico: “Facebook CEO Mark Zuckerberg is quietly cracking open his company’s vast trove of user data for a study on economic inequality in the U.S. — the latest sign of his efforts to reckon with divisions in American society that the social network is accused of making worse.

The study, which hasn’t previously been reported, is mining the social connections among Facebook’s American users to shed light on the growing income disparity in the U.S., where the top 1 percent of households is said to control 40 percent of the country’s wealth. Facebook is an incomparably rich source of information for that kind of research: By one estimate, about three of five American adults use the social network….

Facebook confirmed the broad contours of its partnership with Chetty but declined to elaborate on the substance of the study. Chetty, in a brief interview following a January speech in Washington, said he and his collaborators — who include researchers from Stanford and New York University — have been working on the inequality study for at least six months.

“We’re using social networks, and measuring interactions there, to understand the role of social capital much better than we’ve been able to,” he said.

Researchers say they see Facebook’s enormous cache of data as a remarkable resource, offering an unprecedentedly detailed and sweeping look at American society. That store of information contains both details that a user might tell Facebook — their age, hometown, schooling, family relationships — and insights that the company has picked up along the way, such as the interest groups they’ve joined and geographic distribution of who they call a “friend.”

It’s all the more significant, researchers say, when you consider that Facebook’s user base — about 239 million monthly users in the U.S. and Canada at last count — cuts across just about every demographic group.

And all that information, say researchers, lets them take guesses about users’ wealth. Facebook itself recently patented a way of figuring out someone’s socioeconomic status using factors ranging from their stated hobbies to how many internet-connected devices they own.

A Facebook spokesman addressed the potential privacy implications of the study’s access to user data, saying, “We conduct research at Facebook responsibly, which includes making sure we protect people’s information.” The spokesman added that Facebook follows an “enhanced” review process for research projects, adopted in 2014 after a controversy over a study that manipulated some people’s news feeds to see if it made them happier or sadder.

According to a Stanford University source familiar with Chetty’s study, the Facebook account data used in the research has been stripped of any details that could be used to identify users. The source added that academics involved in the study have gone through security screenings that include background checks, and can access the Facebook data only in secure facilities….(More)”.