Google and the University of Chicago Are Sued Over Data Sharing


Daisuke Wakabayashi in The New York Times: “When the University of Chicago Medical Center announced a partnership to share patient data with Google in 2017, the alliance was promoted as a way to unlock information trapped in electronic health records and improve predictive analysis in medicine.

On Wednesday, the University of Chicago, the medical center and Google were sued in a potential class-action lawsuit accusing the hospital of sharing hundreds of thousands of patients’ records with the technology giant without stripping identifiable date stamps or doctor’s notes.

The suit, filed in United States District Court for the Northern District of Illinois, demonstrates the difficulties technology companies face in handling health data as they forge ahead into one of the most promising — and potentially lucrative — areas of artificial intelligence: diagnosing medical problems.

Google is at the forefront of an effort to build technology that can read electronic health records and help physicians identify medical conditions. But the effort requires machines to learn this skill by analyzing a vast array of old health records collected by hospitals and other medical institutions.

That raises privacy concerns, especially when is used by a company like Google, which already knows what you search for, where you are and what interests you hold.

In 2016, DeepMind, a London-based A.I. lab owned by Google’s parent company, Alphabet, was accused of violating patient privacy after it struck a deal with Britain’s National Health Service to process medical data for research….(More)”.

Postsecondary Data Infrastructure: What is Possible Today


Report by Amy O’Hara: “Data sharing across government agencies allows consumers, policymakers, practitioners, and researchers to answer pressing questions. Creating a data infrastructure to enable this data sharing for higher education data is challenging, however, due to legal, privacy, technical, and perception issues. To overcome these challenges, postsecondary education can learn from other domains to permit secure, responsible data access and use. Working models from both the public sector and academia show how sensitive data from multiple sources can be linked and accessed for authorized uses.

This brief describes best practices in use today and the emerging technology that could further protect future data systems and creates a new framework, the “Five Safes”, for controlling data access and use. To support decisions facing students, administrators, evaluators, and policymakers, a postsecondary infrastructure must support cycles of data discovery, request, access, analysis, review, and release. It must be cost-effective, secure, and efficient and, ideally, it will be highly automated, transparent, and adaptable. Other industries have successfully developed such infrastructures, and postsecondary education can learn from their experiences.

A functional data infrastructure relies on trust and control between the data providers, intermediaries, and users. The system should support equitable access for approved users and offer the ability to conduct independent analyses with scientific integrity for reasonable financial costs. Policymakers and developers should ensure the creation of expedient, convenient data access modes that allow for policy analyses. …

The “Five Safes” framework describes an approach for controlling data access and use. The five safes are: safe projects, safe people, safe settings, safe data, and afe outputs….(More)”.

Measuring and Protecting Privacy in the Always-On Era


Paper by Dan Feldman and Eldar Haber: “Datamining practices have become greatly enhanced in the interconnected era. What began with the internetnow continues through the Internet of Things (IoT), whereby users can constantly be connected to the internet through various means like televisions, smartphones, wearables and computerized personal assistants, among other “things.” As many of these devices operate in a so-called “always-on” mode, constantly receiving and transmitting data, the increased use of IoT devices might lead society into an “always-on” era, where individuals are constantly datafied. As the current regulatory approach to privacy is sectoral in nature, i.e., protects privacy only within a specific context of information gathering or use, and directed only to specific pre-defined industries or a specific cohort, the individual’s privacy is at great risk. On the other hand, strict privacy regulation might negatively impact data utility which serves many purposes, and, perhaps mainly, is crucial for technological development and innovation. The tradeoff between data utility and privacy protection is most unlikely to be resolved under the sectoral approach to privacy, but a technological solution that relies mostly on a method called differential privacy might be of great help. It essentially suggests adding “noise” to data deemed sensitive ex-ante, depending on various parameters further suggested in this Article. In other words, using computational solutions combined with formulas that measure the probability of data sensitivity, privacy could be better protected in the always-on era.

This Article introduces legal and computational methods that could be used by IoT service providers and will optimally balance the tradeoff between data utility and privacy. It comprises several stages. The first Part discusses the protection of privacy under the sectoral approach, and estimates what values are embedded in it. The second Part discusses privacy protection in the “always-on” era. First it assesses how technological changes have shaped the sectoral regulation, then discusses why privacy is negatively impacted by IoT devices and the potential applicability of new regulatory mechanisms to meet the challenges of the “always-on” era. After concluding that the current regulatory framework is severely limited in protecting individuals’ privacy, the third Part discusses technology as a panacea, while offering a new computational model that relies on differential privacy and a modern technique called private coreset. The proposed model seeks to introduce “noise” to data on the user’s side to preserve individual’s privacy — depending on the probability of data sensitivity of the IoT device — while enabling service providers to utilize the data….(More)”.

The language we use to describe data can also help us fix its problems


Luke Stark & Anna Lauren Hoffmann at Quartz: “Data is, apparently, everything.

It’s the “new oil” that fuels online business. It comes in floods or tsunamis. We access it via “streams” or “fire hoses.” We scrape it, mine it, bank it, and clean it. (Or, if you prefer your buzzphrases with a dash of ageism and implicit misogyny, big data is like “teenage sex,” while working with it is “the sexiest job” of the century.)

These data metaphors can seem like empty cliches, but at their core they’re efforts to come to grips with the continuing onslaught of connected devices and the huge amounts of data they generate.

In a recent article, we—an algorithmic-fairness researcher at Microsoft and a data-ethics scholar at the University of Washington—push this connection one step further. More than simply helping us wrap our collective heads around data-fueled technological change, we set out to learn what these metaphors can teach us about the real-life ethics of collecting and handling data today.

Instead of only drawing from the norms and commitments of computer science, information science, and statistics, what if we looked at the ethics of the professions evoked by our data metaphors instead?…(More)”.

The Ethics of Big Data Applications in the Consumer Sector


Paper by Markus Christen et al : “Business applications relying on processing of large amounts of heterogeneous data (Big Data) are considered to be key drivers of innovation in the digital economy. However, these applications also pose ethical issues that may undermine the credibility of data-driven businesses. In our contribution, we discuss ethical problems that are associated with Big Data such as: How are core values like autonomy, privacy, and solidarity affected in a Big Data world? Are some data a public good? Or: Are we obliged to divulge personal data to a certain degree in order to make the society more secure or more efficient?

We answer those questions by first outlining the ethical topics that are discussed in the scientific literature and the lay media using a bibliometric approach. Second, referring to the results of expert interviews and workshops with practitioners, we identify core norms and values affected by Big Data applications—autonomy, equality, fairness, freedom, privacy, property-rights, solidarity, and transparency—and outline how they are exemplified in examples of Big Data consumer applications, for example, in terms of informational self-determination, non-discrimination, or free opinion formation. Based on use cases such as personalized advertising, individual pricing, or credit risk management we discuss the process of balancing such values in order to identify legitimate, questionable, and unacceptable Big Data applications from an ethics point of view. We close with recommendations on how practitioners working in applied data science can deal with ethical issues of Big Data….(More)”.

Return on Data


Paper by Noam Kolt: “Consumers routinely supply personal data to technology companies in exchange for services. Yet, the relationship between the utility (U) consumers gain and the data (D) they supply — “return on data” (ROD) — remains largely unexplored. Expressed as a ratio, ROD = U / D. While lawmakers strongly advocate protecting consumer privacy, they tend to overlook ROD. Are the benefits of the services enjoyed by consumers, such as social networking and predictive search, commensurate with the value of the data extracted from them? How can consumers compare competing data-for-services deals?

Currently, the legal frameworks regulating these transactions, including privacy law, aim primarily to protect personal data. They treat data protection as a standalone issue, distinct from the benefits which consumers receive. This article suggests that privacy concerns should not be viewed in isolation, but as part of ROD. Just as companies can quantify return on investment (ROI) to optimize investment decisions, consumers should be able to assess ROD in order to better spend and invest personal data. Making data-for-services transactions more transparent will enable consumers to evaluate the merits of these deals, negotiate their terms and make more informed decisions. Pivoting from the privacy paradigm to ROD will both incentivize data-driven service providers to offer consumers higher ROD, as well as create opportunities for new market entrants….(More)”.

We Read 150 Privacy Policies. They Were an Incomprehensible Disaster.


Kevin Litman-Navarro at the New York Times: “….I analyzed the length and readability of privacy policies from nearly 150 popular websites and apps. Facebook’s privacy policy, for example, takes around 18 minutes to read in its entirety – slightly above average for the policies I tested….

Despite efforts like the General Data Protection Regulation to make policies more accessible, there seems to be an intractable tradeoff between a policy’s readability and length. Even policies that are shorter and easier to read can be impenetrable, given the amount of background knowledge required to understand how things like cookies and IP addresses play a role in data collection….

So what might a useful privacy policy look like?

Consumers don’t need a technical understanding of data collection processes in order to protect their personal information. Instead of explaining the excruciatingly complicated inner workings of the data marketplace, privacy policies should help people decide how they want to present themselves online. We tend to go on the internet privately – on our phones or at home – which gives the impression that our activities are also private. But, often, we’re more visible than ever.

A good privacy policy would help users understand how exposed they are: Something as simple as a list of companies that might purchase and use your personal information could go a long way towards setting a new bar for privacy-conscious behavior. For example, if you know that your weather app is constantly tracking your whereabouts and selling your location data as marketing research, you might want to turn off your location services entirely, or find a new app.

Until we reshape privacy policies to meet our needs — or we find a suitable replacement — it’s probably best to act with one rule in mind. To be clear and concise: Someone’s always watching….(More)”.

Data & Policy: A new venue to study and explore policy–data interaction


Opening editorial by Stefaan G. Verhulst, Zeynep Engin and Jon Crowcroft: “…Policy–data interactions or governance initiatives that use data have been the exception rather than the norm, isolated prototypes and trials rather than an indication of real, systemic change. There are various reasons for the generally slow uptake of data in policymaking, and several factors will have to change if the situation is to improve. ….

  • Despite the number of successful prototypes and small-scale initiatives, policy makers’ understanding of data’s potential and its value proposition generally remains limited (Lutes, 2015). There is also limited appreciation of the advances data science has made the last few years. This is a major limiting factor; we cannot expect policy makers to use data if they do not recognize what data and data science can do.
  • The recent (and justifiable) backlash against how certain private companies handle consumer data has had something of a reverse halo effect: There is a growing lack of trust in the way data is collected, analyzed, and used, and this often leads to a certain reluctance (or simply risk-aversion) on the part of officials and others (Engin, 2018).
  • Despite several high-profile open data projects around the world, much (probably the majority) of data that could be helpful in governance remains either privately held or otherwise hidden in silos (Verhulst and Young, 2017b). There remains a shortage not only of data but, more specifically, of high-quality and relevant data.
  • With few exceptions, the technical capacities of officials remain limited, and this has obviously negative ramifications for the potential use of data in governance (Giest, 2017).
  • It’s not just a question of limited technical capacities. There is often a vast conceptual and values gap between the policy and technical communities (Thompson et al., 2015; Uzochukwu et al., 2016); sometimes it seems as if they speak different languages. Compounding this difference in world views is the fact that the two communities rarely interact.
  • Yet, data about the use and evidence of the impact of data remain sparse. The impetus to use more data in policy making is stymied by limited scholarship and a weak evidential basis to show that data can be helpful and how. Without such evidence, data advocates are limited in their ability to make the case for more data initiatives in governance.
  • Data are not only changing the way policy is developed, but they have also reopened the debate around theory- versus data-driven methods in generating scientific knowledge (Lee, 1973; Kitchin, 2014; Chivers, 2018; Dreyfuss, 2017) and thus directly questioning the evidence base to utilization and implementation of data within policy making. A number of associated challenges are being discussed, such as: (i) traceability and reproducibility of research outcomes (due to “black box processing”); (ii) the use of correlation instead of causation as the basis of analysis, biases and uncertainties present in large historical datasets that cause replication and, in some cases, amplification of human cognitive biases and imperfections; and (iii) the incorporation of existing human knowledge and domain expertise into the scientific knowledge generation processes—among many other topics (Castelvecchi, 2016; Miller and Goodchild, 2015; Obermeyer and Emanuel, 2016; Provost and Fawcett, 2013).
  • Finally, we believe that there should be a sound under-pinning a new theory of what we call Policy–Data Interactions. To date, in reaction to the proliferation of data in the commercial world, theories of data management,1 privacy,2 and fairness3 have emerged. From the Human–Computer Interaction world, a manifesto of principles of Human–Data Interaction (Mortier et al., 2014) has found traction, which intends reducing the asymmetry of power present in current design considerations of systems of data about people. However, we need a consistent, symmetric approach to consideration of systems of policy and data, how they interact with one another.

All these challenges are real, and they are sticky. We are under no illusions that they will be overcome easily or quickly….

During the past four conferences, we have hosted an incredibly diverse range of dialogues and examinations by key global thought leaders, opinion leaders, practitioners, and the scientific community (Data for Policy, 2015201620172019). What became increasingly obvious was the need for a dedicated venue to deepen and sustain the conversations and deliberations beyond the limitations of an annual conference. This leads us to today and the launch of Data & Policy, which aims to confront and mitigate the barriers to greater use of data in policy making and governance.

Data & Policy is a venue for peer-reviewed research and discussion about the potential for and impact of data science on policy. Our aim is to provide a nuanced and multistranded assessment of the potential and challenges involved in using data for policy and to bridge the “two cultures” of science and humanism—as CP Snow famously described in his lecture on “Two Cultures and the Scientific Revolution” (Snow, 1959). By doing so, we also seek to bridge the two other dichotomies that limit an examination of datafication and is interaction with policy from various angles: the divide between practice and scholarship; and between private and public…

So these are our principles: scholarly, pragmatic, open-minded, interdisciplinary, focused on actionable intelligence, and, most of all, innovative in how we will share insight and pushing at the boundaries of what we already know and what already exists. We are excited to launch Data & Policy with the support of Cambridge University Press and University College London, and we’re looking for partners to help us build it as a resource for the community. If you’re reading this manifesto it means you have at least a passing interest in the subject; we hope you will be part of the conversation….(More)”.

Privacy Enhancing Technologies


The Royal Society: “How can technologies help organisations and individuals protect data in practice and, at the same time, unlock opportunities for data access and use?

The Royal Society’s Privacy Enhancing Technologies project has been investigating this question and has launched a report (PDF) setting out the current use, development and limits of privacy enhancing technologies (PETs) in data analysis. 

The data we generate every day holds a lot of value and potentially also contains sensitive information that individuals or organisations might not wish to share with everyone. The protection of personal or sensitive data featured prominently in the social and ethical tensions identified in our British Academy and Royal Society report Data management and use: Governance in the 21st century. For example, how can organisations best use data for public good whilst protecting sensitive information about individuals? Under other circumstances, how can they share data with groups with competing interests whilst protecting commercially or otherwise sensitive information?

Realising the full potential of large-scale data analysis may be constrained by important legal, reputational, political, business and competition concerns.  Certain risks can potentially be mitigated and managed with a set of emerging technologies and approaches often collectively referred to as ‘Privacy Enhancing Technologies’ (PETs). 

This disruptive set of technologies, combined with changes in wider policy and business frameworks, could enable the sharing and use of data in a privacy-preserving manner. They also have the potential to reshape the data economy and to change the trust relationships between citizens, governments and companies.

This report provides a high-level overview of five current and promising PETs of a diverse nature, with their respective readiness levels and illustrative case studies from a range of sectors, with a view to inform in particular applied data science research and the digital strategies of government departments and businesses. This report also includes recommendations on how the UK could fully realise the potential of PETs and to allow their use on a greater scale.

The project was informed by a series of conversations and evidence gathering events, involving a range of stakeholders across academia, government and the private sector (also see the project terms of reference and Working Group)….(More)”.

The Tricky Ethics of Using YouTube Videos for Academic Research


Jane C.Hu in P/S Magazine: “…But just because something is legal doesn’t mean it’s ethical. That doesn’t mean it’s necessarily unethical, either, but it’s worth asking questions about how and why researchers use social media posts, and whether those uses could be harmful. I was once a researcher who had to obtain human-subjects approval from a university institutional review board, and I know it can be a painstaking application process with long wait times. Collecting data from individuals takes a long time too. If you could just sub in YouTube videos in place of collecting your own data, that saves time, money, and effort. But that could be at the expense of the people whose data you’re scraping.

But, you might say, if people don’t want to be studied online, then they shouldn’t post anything. But most people don’t fully understand what “publicly available” really means or its ramifications. “You might know intellectually that technically anyone can see a tweet, but you still conceptualize your audience as being your 200 Twitter followers,” Fiesler says. In her research, she’s found that the majority of people she’s polled have no clue that researchers study public tweets.

Some may disagree that it’s researchers’ responsibility to work around social media users’ ignorance, but Fiesler and others are calling for their colleagues to be more mindful about any work that uses publicly available data. For instance, Ashley Patterson, an assistant professor of language and literacy at Penn State University, ultimately decided to use YouTube videos in her dissertation work on biracial individuals’ educational experiences. That’s a decision she arrived at after carefully considering her options each step of the way. “I had to set my own levels of ethical standards and hold myself to it, because I knew no one else would,” she says. One of Patterson’s first steps was to ask herself what YouTube videos would add to her work, and whether there were any other ways to collect her data. “It’s not a matter of whether it makes my life easier, or whether it’s ‘just data out there’ that would otherwise go to waste. The nature of my question and the response I was looking for made this an appropriate piece [of my work],” she says.

Researchers may also want to consider qualitative, hard-to-quantify contextual cues when weighing ethical decisions. What kind of data is being used? Fiesler points out that tweets about, say, a television show are way less personal than ones about a sensitive medical condition. Anonymized written materials, like Facebook posts, could be less invasive than using someone’s face and voice from a YouTube video. And the potential consequences of the research project are worth considering too. For instance, Fiesler and other critics have pointed out that researchers who used YouTube videos of people documenting their experience undergoing hormone replacement therapy to train an artificial intelligence to identify trans people could be putting their unwitting participants in danger. It’s not obvious how the results of Speech2Face will be used, and, when asked for comment, the paper’s researchers said they’d prefer to quote from their paper, which pointed to a helpful purpose: providing a “representative face” based on the speaker’s voice on a phone call. But one can also imagine dangerous applications, like doxing anonymous YouTubers.

One way to get ahead of this, perhaps, is to take steps to explicitly inform participants their data is being used. Fiesler says that, when her team asked people how they’d feel after learning their tweets had been used for research, “not everyone was necessarily super upset, but most people were surprised.” They also seemed curious; 85 percent of participants said that, if their tweet were included in research, they’d want to read the resulting paper. “In human-subjects research, the ethical standard is informed consent, but inform and consent can be pulled apart; you could potentially inform people without getting their consent,” Fiesler suggests….(More)”.