Paper by Kush R. Varshney: “This paper presents a viewpoint on an emerging dichotomy in data science: applications in which predictions of datadriven algorithms are used to support people in making consequential decisions that can have a profound effect on other people’s lives and applications in which data-driven algorithms act autonomously in settings of low consequence and large scale. An example of the first type of application is prison sentencing and of the second type is selecting news stories to appear on a person’s web portal home page. It is argued that the two types of applications require data, algorithms and models with vastly different properties along several dimensions, including privacy, equitability, robustness, interpretability, causality, and openness. Furthermore, it is argued that the second type of application cannot always be used as a surrogate to develop methods for the first type of application. To contribute to the development of methods for the first type of application, one must really be working on the first type of application….(More)”
Web design plays a role in how much we reveal online
European Commission: “A JRC study, “Nudges to Privacy Behaviour: Exploring an Alternative Approach to Privacy Notices“, used behavioural sciences to look at how individuals react to different types of privacy notices. Specifically, the authors analysed users’ reactions to modified choice architecture (i.e. the environment in which decisions take place) of web interfaces.
Two types of privacy behaviour were measured: passive disclosure, when people unwittingly disclose personal information, and direct disclosure, when people make an active choice to reveal personal information. After testing different designs with over 3 000 users from the UK, Italy, Germany and Poland, results show web interface affects decisions on disclosing personal information. The study also explored differences related to country of origin, gender, education level and age.
A depiction of a person’s face on the website led people to reveal more personal information. Also, this design choice and the visualisation of the user’s IP or browsing history had an impact on people’s awareness of a privacy notice. If confirmed, these features are particularly relevant for habitual and instinctive online behaviour.
With regard to education, users who had attended (though not necessarily graduated from) college felt significantly less observed or monitored and more comfortable answering questions than those who never went to college. This result challenges the assumption that the better educated are more aware of information tracking practices. Further investigation, perhaps of a qualitative nature, could help dig deeper into this issue. On the other hand, people with a lower level of education were more likely to reveal personal information unwittingly. This behaviour appeared to be due to the fact that non-college attendees were simply less aware that some online behaviour revealed personal information about themselves.
Strong differences between countries were noticed, indicating a relation between cultures and information disclosure. Even though participants in Italy revealed the most personal information in passive disclosure, in direct disclosure they revealed less than in other countries. Approximately 75% of participants in Italy chose to answer positively to at least one stigmatised question, compared to 81% in Poland, 83% in Germany and 92% in the UK.
Approximately 73% of women answered ‘never’ to the questions asking whether they had ever engaged in socially stigmatised behaviour, compared to 27% of males. This large difference could be due to the nature of the questions (e.g. about alcohol consumption, which might be more acceptable for males). It could also suggest women feel under greater social scrutiny or are simply more cautious when disclosing personal information.
These results could offer valuable insights to inform European policy decisions, despite the fact that the study has targeted a sample of users in four countries in an experimental setting. Major web service providers are likely to have extensive amounts of data on how slight changes to their services’ privacy controls affect users’ privacy behaviour. The authors of the study suggest that collaboration between web providers and policy-makers can lead to recommendations for web interface design that allow for conscientious disclosure of privacy information….(More)”
Nudge 2.0
Philipp Hacker: “This essay is both a review of the excellent book “Nudge and the Law. A European Perspective”, edited by Alberto Alemanno and Anne-Lise Sibony, and an assessment of the major themes and challenges that the behavioural analysis of law will and should face in the immediate future.
The book makes important and novel contributions in a range of topics, both on a theoretical and a substantial level. Regarding theoretical issues, four themes stand out: First, it highlights the differences between the EU and the US nudging environments. Second, it questions the reliance on expertise in rulemaking. Third, it unveils behavioural trade-offs that have too long gone unnoticed in behavioural law and economics. And fourth, it discusses the requirement of the transparency of nudges and the related concept of autonomy. Furthermore, the different authors discuss the impact of behavioural regulation on a number of substantial fields of law: health and lifestyle regulation, privacy law, and the disclosure paradigm in private law.
This paper aims to take some of the book’s insights one step further in order to point at crucial challenges – and opportunities – for the future of the behavioural analysis of law. In the last years, the movement has gained tremendously in breadth and depth. It is now time to make it scientifically even more rigorous, e.g. by openly embracing empirical uncertainty and by moving beyond the neo-classical/behavioural dichotomy. Simultaneously, the field ought to discursively readjust its normative compass. Finally and perhaps most strikingly, however, the power of big data holds the promise of taking behavioural interventions to an entirely new level. If these challenges can be overcome, this paper argues, the intersection between law and behavioural sciences will remain one of the most fruitful approaches to legal analysis in Europe and beyond….(More)”
Data-Driven Innovation: Big Data for Growth and Well-Being
“A new OECD report on data-driven innovation finds that countries could be getting much more out of data analytics in terms of economic and social gains if governments did more to encourage investment in “Big Data” and promote data sharing and reuse.
The migration of economic and social activities to the Internet and the advent of The Internet of Things – along with dramatically lower costs of data collection, storage and processing and rising computing power – means that data-analytics is increasingly driving innovation and is potentially an important new source of growth.
The report suggest countries act to seize these benefits, by training more and better data scientists, reducing barriers to cross-border data flows, and encouraging investment in business processes to incorporate data analytics.
Few companies outside of the ICT sector are changing internal procedures to take advantage of data. For example, data gathered by companies’ marketing departments is not always used by other departments to drive decisions and innovation. And in particular, small and medium-sized companies face barriers to the adoption of data-related technologies such as cloud computing, partly because they have difficulty implementing organisational change due to limited resources, including the shortage of skilled personnel.
At the same time, governments will need to anticipate and address the disruptive effects of big data on the economy and overall well-being, as issues as broad as privacy, jobs, intellectual property rights, competition and taxation will be impacted. Read the Policy Brief…
Preface | |
Foreword | |
Executive summary | |
The phenomenon of data-driven innovation | |
Mapping the global data ecosystem and its points of control | |
How data now drive innovation | |
Drawing value from data as an infrastructure | |
Building trust for data-driven innovation | |
Skills and employment in a data-driven economy | |
Promoting data-driven scientific research | |
The evolution of health care in a data-rich environment | |
Cities as hubs for data-driven innovation | |
Governments leading by example with public sector data |
Health Data Governance: Privacy, Monitoring and Research
OECD publishing: “All countries are investing in health data, however; there are significant cross-country differences in data availability and use. Some countries stand out for their innovative practices enabling privacy-protective respectful data use; while others are falling behind with insufficient data and restrictions that limit access to and use of data, even by government itself. Countries that develop a data governance framework that enables privacy-protective data use will not only have the information needed to promote quality, efficiency and performance in their health systems, they will become a more attractive centre for medical research. After examining the current situation in OECD countries, a multi-disciplinary advisory panel of experts identified eight key data governance mechanisms to maximise benefits to patients and to societies from the collection, linkage and analysis of health data and to, at the same time, minimise risks to the privacy of patients and to the security of health data. These mechanisms include coordinated developming of high-value, privacy-protective health information systems; legislation that permits privacy-protective data use; open and transparent public communication ; accreditation or certification of health data processors; transparent and fair project approval processes; data de-identification and data security practices that meet legal requirements and public expectations without compromising data utility; and a process to continually assess and renew the data governance framework as new data and new risks emerge…”
.
Big Data Privacy Scenarios
E. Bruce, K. Sollins, M. Vernon, and D. Weitzner at D-Space@MIT: “This paper is the first in a series on privacy in Big Data. As an outgrowth of a series of workshops on the topic, the Big Data Privacy Working Group undertook a study of a series of use scenarios to highlight the challenges to privacy that arise in the Big Data arena. This is a report on those scenarios. The deeper question explored by this exercise is what is distinctive about privacy in the context of Big Data. In addition, we discuss an initial list of issues for privacy that derive specifically from the nature of Big Data. These derive from observations across the real world scenarios and use cases explored in this project as well as wider reading and discussions:
* Scale: The sheer size of the datasets leads to challenges in creating, managing and applying privacy policies.
* Diversity: The increased likelihood of more and more diverse participants in Big Data collection, management, and use, leads to differing agendas and objectives. By nature, this is likely to lead to contradictory agendas and objectives.
* Integration: With increased data management technologies (e.g. cloud services, data lakes, and so forth), integration across datasets, with new and often surprising opportunities for cross-product inferences, will also come new information about individuals and their behaviors.
* Impact on secondary participants: Because many pieces of information are reflective of not only the targeted subject, but secondary, often unattended, participants, the inferences and resulting information will increasingly be reflective of other people, not originally considered as the subject of privacy concerns and approaches.
* Need for emergent policies for emergent information: As inferences over merged data sets occur, emergent information or understanding will occur.
Although each unique data set may have existing privacy policies and enforcement mechanisms, it is not clear that it is possible to develop the requisite and appropriate emerged privacy policies and appropriate enforcement of them automatically…(More)”
What we can learn from the failure of Google Flu Trends
David Lazer and Ryan Kennedy at Wired: “….The issue of using big data for the common good is far more general than Google—which deserves credit, after all, for offering the occasional peek at their data. These records exist because of a compact between individual consumers and the corporation. The legalese of that compact is typically obscure (how many people carefully read terms and conditions?), but the essential bargain is that the individual gets some service, and the corporation gets some data.
What is left out that bargain is the public interest. Corporations and consumers are part of a broader society, and many of these big data archives offer insights that could benefit us all. As Eric Schmidt, CEO of Google, has said, “We must remember that technology remains a tool of humanity.” How can we, and corporate giants, then use these big data archives as a tool to serve humanity?
Google’s sequel to GFT, done right, could serve as a model for collaboration around big data for the public good. Google is making flu-related search data available to the CDC as well as select research groups. A key question going forward will be whether Google works with these groups to improve the methodology underlying GFT. Future versions should, for example, continually update the fit of the data to flu prevalence—otherwise, the value of the data stream will rapidly decay.
This is just an example, however, of the general challenge of how to build models of collaboration amongst industry, government, academics, and general do-gooders to use big data archives to produce insights for the public good. This came to the fore with the struggle (and delay) for finding a way to appropriately share mobile phone data in west Africa during the Ebola epidemic (mobile phone data are likely the best tool for understanding human—and thus Ebola—movement). Companies need to develop efforts to share data for the public good in a fashion that respects individual privacy.
There is not going to be a single solution to this issue, but for starters, we are pushing for a “big data” repository in Boston to allow holders of sensitive big data to share those collections with researchers while keeping them totally secure. The UN has its Global Pulse initiative, setting up collaborative data repositories around the world. Flowminder, based in Sweden, is a nonprofit dedicated to gathering mobile phone data that could help in response to disasters. But these are still small, incipient, and fragile efforts.
The question going forward now is how build on and strengthen these efforts, while still guarding the privacy of individuals and the proprietary interests of the holders of big data….(More)”
Researchers wrestle with a privacy problem
Erika Check Hayden at Nature: “The data contained in tax returns, health and welfare records could be a gold mine for scientists — but only if they can protect people’s identities….In 2011, six US economists tackled a question at the heart of education policy: how much does great teaching help children in the long run?
They started with the records of more than 11,500 Tennessee schoolchildren who, as part of an experiment in the 1980s, had been randomly assigned to high- and average-quality teachers between the ages of five and eight. Then they gauged the children’s earnings as adults from federal tax returns filed in the 2000s. The analysis showed that the benefits of a good early education last for decades: each year of better teaching in childhood boosted an individual’s annual earnings by some 3.5% on average. Other data showed the same individuals besting their peers on measures such as university attendance, retirement savings, marriage rates and home ownership.
The economists’ work was widely hailed in education-policy circles, and US President Barack Obama cited it in his 2012 State of the Union address when he called for more investment in teacher training.
But for many social scientists, the most impressive thing was that the authors had been able to examine US federal tax returns: a closely guarded data set that was then available to researchers only with tight restrictions. This has made the study an emblem for both the challenges and the enormous potential power of ‘administrative data’ — information collected during routine provision of services, including tax returns, records of welfare benefits, data on visits to doctors and hospitals, and criminal records. Unlike Internet searches, social-media posts and the rest of the digital trails that people establish in their daily lives, administrative data cover entire populations with minimal self-selection effects: in the US census, for example, everyone sampled is required by law to respond and tell the truth.
This puts administrative data sets at the frontier of social science, says John Friedman, an economist at Brown University in Providence, Rhode Island, and one of the lead authors of the education study “They allow researchers to not just get at old questions in a new way,” he says, “but to come at problems that were completely impossible before.”….
But there is also concern that the rush to use these data could pose new threats to citizens’ privacy. “The types of protections that we’re used to thinking about have been based on the twin pillars of anonymity and informed consent, and neither of those hold in this new world,” says Julia Lane, an economist at New York University. In 2013, for instance, researchers showed that they could uncover the identities of supposedly anonymous participants in a genetic study simply by cross-referencing their data with publicly available genealogical information.
Many people are looking for ways to address these concerns without inhibiting research. Suggested solutions include policy measures, such as an international code of conduct for data privacy, and technical methods that allow the use of the data while protecting privacy. Crucially, notes Lane, although preserving privacy sometimes complicates researchers’ lives, it is necessary to uphold the public trust that makes the work possible.
“Difficulty in access is a feature, not a bug,” she says. “It should be hard to get access to data, but it’s very important that such access be made possible.” Many nations collect administrative data on a massive scale, but only a few, notably in northern Europe, have so far made it easy for researchers to use those data.
In Denmark, for instance, every newborn child is assigned a unique identification number that tracks his or her lifelong interactions with the country’s free health-care system and almost every other government service. In 2002, researchers used data gathered through this identification system to retrospectively analyse the vaccination and health status of almost every child born in the country from 1991 to 1998 — 537,000 in all. At the time, it was the largest study ever to disprove the now-debunked link between measles vaccination and autism.
Other countries have begun to catch up. In 2012, for instance, Britain launched the unified UK Data Service to facilitate research access to data from the country’s census and other surveys. A year later, the service added a new Administrative Data Research Network, which has centres in England, Scotland, Northern Ireland and Wales to provide secure environments for researchers to access anonymized administrative data.
In the United States, the Census Bureau has been expanding its network of Research Data Centers, which currently includes 19 sites around the country at which researchers with the appropriate permissions can access confidential data from the bureau itself, as well as from other agencies. “We’re trying to explore all the available ways that we can expand access to these rich data sets,” says Ron Jarmin, the bureau’s assistant director for research and methodology.
In January, a group of federal agencies, foundations and universities created the Institute for Research on Innovation and Science at the University of Michigan in Ann Arbor to combine university and government data and measure the impact of research spending on economic outcomes. And in July, the US House of Representatives passed a bipartisan bill to study whether the federal government should provide a central clearing house of statistical administrative data.
Yet vast swathes of administrative data are still inaccessible, says George Alter, director of the Inter-university Consortium for Political and Social Research based at the University of Michigan, which serves as a data repository for approximately 760 institutions. “Health systems, social-welfare systems, financial transactions, business records — those things are just not available in most cases because of privacy concerns,” says Alter. “This is a big drag on research.”…
Many researchers argue, however, that there are legitimate scientific uses for such data. Jarmin says that the Census Bureau is exploring the use of data from credit-card companies to monitor economic activity. And researchers funded by the US National Science Foundation are studying how to use public Twitter posts to keep track of trends in phenomena such as unemployment.
….Computer scientists and cryptographers are experimenting with technological solutions. One, called differential privacy, adds a small amount of distortion to a data set, so that querying the data gives a roughly accurate result without revealing the identity of the individuals involved. The US Census Bureau uses this approach for its OnTheMap project, which tracks workers’ daily commutes. ….In any case, although synthetic data potentially solve the privacy problem, there are some research applications that cannot tolerate any noise in the data. A good example is the work showing the effect of neighbourhood on earning potential3, which was carried out by Raj Chetty, an economist at Harvard University in Cambridge, Massachusetts. Chetty needed to track specific individuals to show that the areas in which children live their early lives correlate with their ability to earn more or less than their parents. In subsequent studies5, Chetty and his colleagues showed that moving children from resource-poor to resource-rich neighbourhoods can boost their earnings in adulthood, proving a causal link.
Secure multiparty computation is a technique that attempts to address this issue by allowing multiple data holders to analyse parts of the total data set, without revealing the underlying data to each other. Only the results of the analyses are shared….(More)”
Ethical, Safe, and Effective Digital Data Use in Civil Society
Blog by Lucy Bernholz, Rob Reich, Emma Saunders-Hastings, and Emma Leeds Armstrong: “How do we use digital data ethically, safely, and effectively in civil society. We have developed three early principles for consideration:
- Default to person-centered consent.
- Prioritize privacy and minimum viable data collection.
- Plan from the beginning to open (share) your work.
This post provides a synthesis from a one day workshop that informed these principles. It concludes with links to draft guidelines you can use to inform partnerships between data consultants/volunteers and nonprofit organizations….(More)
These three values — consent, minimum viable data collection, and open sharing- comprise a basic framework for ethical, safe, and effective use of digital data by civil society organizations. They should be integrated into partnerships with data intermediaries and, perhaps, into general data practices in civil society.
We developed two tools to guide conversations between data volunteers and/or consultants and nonprofits. These are downloadable below. Please use them, share them, improve them, and share them again….
- Checklist for NGOs and external data consultants
- Guidelines for NGOs and external data consultants (More)”
Research on digital identity ecosystems
Francesca Bria et al at NESTA/D-CENT: “This report presents a concrete analysis of the latest evolution of the identity ecosystem in the big data context, focusing on the economic and social value of data and identity within the current digital economy. This report also outlines economic, policy, and technical alternatives to develop an identity ecosystem and management of data for the common good that respects citizens’ rights, privacy and data protection.
Key findings
- This study presents a review of the concept of identity and a map of the key players in the identity industry (such as data brokers and data aggregators), including empirical case studies of identity management in key sectors.
…. - The “datafication” of individuals’ social lives, thoughts and moves is a valuable commodity and constitutes the backbone of the “identity market” within which “data brokers” (collectors, purchasers or sellers) play key different roles in creating the market by offering various services such as fraud, customer relation, predictive analytics, marketing and advertising.
- Economic, political and technical alternatives for identity to preserve trust, privacy and data ownership in today’s big data environments are formulated. The report looks into access to data, economic strategies to manage data as commons, consent and licensing, tools to control data, and terms of services. It also looks into policy strategies such as privacy and data protection by design and trust and ethical frameworks. Finally, it assesses technical implementations looking at identity and anonymity, cryptographic tools; security; decentralisation and blockchains. It also analyses the future steps needed in order to move into the suggested technical strategies….(More)”