The big questions for research using personal data


 at Royal Society’s “Verba”: “We live in an era of data. The world is generating 1.7 million billion bytes of data every minute and the total amount of global data is expected to grow 40% year on year for the next decade (PDF). In 2003 scientists declared the mapping of the human genome complete. It took over 10 years and cost $1billion – today it takes mere days and can be done at a fraction of the cost.

Making the most of the data revolution will be key to future scientific and economic progress. Unlocking the value of data by improving the way that we collect, analyse and use data has the potential to improve lives across a multitude of areas, ranging from business to health, and from tackling climate change to aiding civic engagement. However, its potential for public benefit must be balanced against the need for data to be used intelligently and with respect for individuals’ privacy.

Getting regulation right

The UK Data Protection Act was transposed into UK law following the 1995 European Data Protection Directive. This was at a time before wide-spread use of internet and smartphones. In 2012, recognising the pace of technological change, the European Commission proposed a comprehensive reform of EU data protection rules including a new Data Protection Regulation that would update and harmonise these rules across the EU.

The draft regulation is currently going through the EU legislative process. During this, the European Parliament has proposed changes to the Commission’s text. These changes have raised concerns for researchers across Europe that the Regulation could risk restricting the use of personal data for research which could prevent much vital health research. For example, researchers currently use these data to better understand how to prevent and treat conditions such as cancer, diabetes and dementia. The final details of the regulation are now being negotiated and the research community has come together to highlight the importance of data in research and articulate their concerns in a joint statement, which the Society supports.

The Society considers that datasets should be managed according to a system of proportionate governance. Personal data should only be shared if it is necessary for research with the potential for high public value and should be proportionate to the particular needs of a research project. It should also draw on consent, authorisation and safe havens – secure sites for databases containing sensitive personal data that can only be accessed by authorised researchers – as appropriate…..

However, many challenges remain that are unlikely to be resolved in the current European negotiations. The new legislation covers personal data but not anonymised data, which are data that have had information that can identify persons removed or replaced with a code. The assumption is that anonymisation is a foolproof way to protect personal identity. However, there have been examples of reidentification from anonymised data and computer scientists have long pointed out the flaws of relying on anonymisation to protect an individual’s privacy….There is also a risk of leaving the public behind with lack of information and failed efforts to earn trust; and it is clear that a better understanding of the role of consent and ethical governance is needed to ensure the continuation of cutting edge research which respects the principles of privacy.

These are problems that will require attention, and questions that the Society will continue to explore. …(More)”

The Internet of Things: Frequently Asked Questions


Eric A. Fischer at the Congressional Research Service: “Internet of Things” (IoT) refers to networks of objects that communicate with other objects and with computers through the Internet. “Things” may include virtually any object for which remote communication, data collection, or control might be useful, such as vehicles, appliances, medical devices, electric grids, transportation infrastructure, manufacturing equipment, or building systems. In other words, the IoT potentially includes huge numbers and kinds of interconnected objects. It is often considered the next major stage in the evolution of cyberspace. Some observers believe it might even lead to a world where cyberspace and human space would seem to effectively merge, with unpredictable but potentially momentous societal and cultural impacts.

Two features makes objects part of the IoT—a unique identifier and Internet connectivity. Such “smart” objects each have a unique Internet Protocol (IP) address to identify the object sending and receiving information. Smart objects can form systems that communicate among themselves, usually in concert with computers, allowing automated and remote control of many independent processes and potentially transforming them into integrated systems. Those systems can potentially impact homes and communities, factories and cities, and every sector of the economy, both domestically and globally. Although the full extent and nature of the IoT’s impacts remain uncertain, economic analyses predict that it will contribute trillions of dollars to economic growth over the next decade. Sectors that may be particularly affected include agriculture, energy, government, health care, manufacturing, and transportation.

The IoT can contribute to more integrated and functional infrastructure, especially in “smart cities,” with projected improvements in transportation, utilities, and other municipal services. The Obama Administration announced a smart-cities initiative in September 2015. There is no single federal agency that has overall responsibility for the IoT. Agencies may find IoT applications useful in helping them fulfill their missions. Each is responsible for the functioning and security of its own IoT, although some technologies, such as drones, may fall under the jurisdiction of other agencies as well. Various agencies also have relevant regulatory, sector-specific, and other mission-related responsibilities, such as the Departments of Commerce, Energy, and Transportation, the Federal Communications Commission, and the Federal Trade Commission.

Security and privacy are often cited as major issues for the IoT, given the perceived difficulties of providing adequate cybersecurity for it, the increasing role of smart objects in controlling components of infrastructure, and the enormous increase in potential points of attack posed by the proliferation of such objects. The IoT may also pose increased risks to privacy, with cyberattacks potentially resulting in exfiltration of identifying or other sensitive information about an individual. With an increasing number of IoT objects in use, privacy concerns also include questions about the ownership, processing, and use of the data they generate….(More)”

Testing governance: the laboratory lives and methods of policy innovation labs


Ben Williamson at Code Acts in Education: “Digital technologies are increasingly playing a significant role in techniques of governance in sectors such as education as well as healthcare, urban management, and in government innovation and citizen engagement in government services. But these technologies need to be sponsored and advocated by particular individuals and groups before they are embedded in these settings.

Testing governance cover

I have produced a working paper entitled Testing governance: the laboratory lives and methods of policy innovation labs which examines the role of innovation labs as sponsors of new digital technologies of governance. By combining resources and practices from politics, data analysis, media, design, and digital innovation, labs act as experimental R&D labs and practical ideas organizations for solving social and public problems, located in the borderlands between sectors, fields and disciplinary methodologies. Labs are making methods such as data analytics, design thinking and experimentation into a powerful set of governing resources.They are, in other words, making digital methods into key techniques for understanding social and public issues, and in the creation and circulation of solutions to the problems of contemporary governance–in education and elsewhere.

The working paper analyses the key methods and messages of the labs field, in particular by investigating the documentary history of Futurelab, a prototypical lab for education research and innovation that operated in Bristol, UK, between 2002 and 2010, and tracing methodological continuities through the current wave of lab development. Centrally, the working paper explores Futurelab’s contribution to the production and stabilization of a ‘sociotechnical imaginary’ of the future of education specifically, and to the future of public services more generally. It offers some preliminary analysis of how such an imaginary was embedded in the ‘laboratory life’ of Futurelab, established through its organizational networks, and operationalized in its digital methods of research and development as well as its modes of communication….(More)”

Weak States, Poor Countries


Angus Deaton in Project Syndicate: “Europeans tend to feel more positively about their governments than do Americans, for whom the failures and unpopularity of their federal, state, and local politicians are a commonplace. Yet Americans’ various governments collect taxes and, in return, provide services without which they could not easily live their lives.

Americans, like many citizens of rich countries, take for granted the legal and regulatory system, the public schools, health care and social security for the elderly, roads, defense and diplomacy, and heavy investments by the state in research, particularly in medicine. Certainly, not all of these services are as good as they might be, nor held in equal regard by everyone; but people mostly pay their taxes, and if the way that money is spent offends some, a lively public debate ensues, and regular elections allow people to change priorities.

All of this is so obvious that it hardly needs saying – at least for those who live in rich countries with effective governments. But most of the world’s population does not.

In much of Africa and Asia, states lack the capacity to raise taxes or deliver services. The contract between government and governed – imperfect in rich countries – is often altogether absent in poor countries. The New York cop was little more than impolite (and busy providing a service); in much of the world, police prey on the people they are supposed to protect, shaking them down for money or persecuting them on behalf of powerful patrons.

Even in a middle-income country like India, public schools and public clinics face mass (unpunished) absenteeism. Private doctors give people what (they think) they want – injections, intravenous drips, and antibiotics – but the state does not regulate them, and many practitioners are entirely unqualified.

Throughout the developing world, children die because they are born in the wrong place – not of exotic, incurable diseases, but of the commonplace childhood illnesses that we have known how to treat for almost a century. Without a state that is capable of delivering routine maternal and child health care, these children will continue to die.

Likewise, without government capacity, regulation and enforcement do not work properly, so businesses find it difficult to operate. Without properly functioning civil courts, there is no guarantee that innovative entrepreneurs can claim the rewards of their ideas.

The absence of state capacity – that is, of the services and protections that people in rich countries take for granted – is one of the major causes of poverty and deprivation around the world. Without effective states working with active and involved citizens, there is little chance for the growth that is needed to abolish global poverty.

Unfortunately, the world’s rich countries currently are making things worse. Foreign aid – transfers from rich countries to poor countries – has much to its credit, particularly in terms of health care, with many people alive today who would otherwise be dead. But foreign aid also undermines the development of local state capacity….

One thing that we can do is to agitate for our own governments to stop doing those things that make it harder for poor countries to stop being poor. Reducing aid is one, but so is limiting the arms trade, improving rich-country trade and subsidy policies, providing technical advice that is not tied to aid, and developing better drugs for diseases that do not affect rich people. We cannot help the poor by making their already-weak governments even weaker….(More)”

Nudge 2.0


Philipp Hacker: “This essay is both a review of the excellent book “Nudge and the Law. A European Perspective”, edited by Alberto Alemanno and Anne-Lise Sibony, and an assessment of the major themes and challenges that the behavioural analysis of law will and should face in the immediate future.

The book makes important and novel contributions in a range of topics, both on a theoretical and a substantial level. Regarding theoretical issues, four themes stand out: First, it highlights the differences between the EU and the US nudging environments. Second, it questions the reliance on expertise in rulemaking. Third, it unveils behavioural trade-offs that have too long gone unnoticed in behavioural law and economics. And fourth, it discusses the requirement of the transparency of nudges and the related concept of autonomy. Furthermore, the different authors discuss the impact of behavioural regulation on a number of substantial fields of law: health and lifestyle regulation, privacy law, and the disclosure paradigm in private law.

This paper aims to take some of the book’s insights one step further in order to point at crucial challenges – and opportunities – for the future of the behavioural analysis of law. In the last years, the movement has gained tremendously in breadth and depth. It is now time to make it scientifically even more rigorous, e.g. by openly embracing empirical uncertainty and by moving beyond the neo-classical/behavioural dichotomy. Simultaneously, the field ought to discursively readjust its normative compass. Finally and perhaps most strikingly, however, the power of big data holds the promise of taking behavioural interventions to an entirely new level. If these challenges can be overcome, this paper argues, the intersection between law and behavioural sciences will remain one of the most fruitful approaches to legal analysis in Europe and beyond….(More)”

Five principles for applying data science for social good


Jake Porway at O’Reilly: “….Every week, a data or technology company declares that it wants to “do good” and there are countless workshops hosted by major foundations musing on what “big data can do for society.” Add to that a growing number of data-for-good programs from Data Science for Social Good’s fantastic summer program toBayes Impact’s data science fellowships to DrivenData’s data-science-for-good competitions, and you can see how quickly this idea of “data for good” is growing.

Yes, it’s an exciting time to be exploring the ways new datasets, new techniques, and new scientists could be deployed to “make the world a better place.” We’ve already seen deep learning applied to ocean health,satellite imagery used to estimate poverty levels, and cellphone data used to elucidate Nairobi’s hidden public transportation routes. And yet, for all this excitement about the potential of this “data for good movement,” we are still desperately far from creating lasting impact. Many efforts will not only fall short of lasting impact — they will make no change at all….

So how can these well-intentioned efforts reach their full potential for real impact? Embracing the following five principles can drastically accelerate a world in which we truly use data to serve humanity.

1. “Statistics” is so much more than “percentages”

We must convey what constitutes data, what it can be used for, and why it’s valuable.

There was a packed house for the March 2015 release of the No Ceilings Full Participation Report. Hillary Clinton, Melinda Gates, and Chelsea Clinton stood on stage and lauded the report, the culmination of a year-long effort to aggregate and analyze new and existing global data, as the biggest, most comprehensive data collection effort about women and gender ever attempted. One of the most trumpeted parts of the effort was the release of the data in an open and easily accessible way.

I ran home and excitedly pulled up the data from the No Ceilings GitHub, giddy to use it for our DataKind projects. As I downloaded each file, my heart sunk. The 6MB size of the entire global dataset told me what I would find inside before I even opened the first file. Like a familiar ache, the first row of the spreadsheet said it all: “USA, 2009, 84.4%.”

What I’d encountered was a common situation when it comes to data in the social sector: the prevalence of inert, aggregate data. ….

2. Finding problems can be harder than finding solutions

We must scale the process of problem discovery through deeper collaboration between the problem holders, the data holders, and the skills holders.

In the immortal words of Henry Ford, “If I’d asked people what they wanted, they would have said a faster horse.” Right now, the field of data science is in a similar position. Framing data solutions for organizations that don’t realize how much is now possible can be a frustrating search for faster horses. If data cleaning is 80% of the hard work in data science, then problem discovery makes up nearly the remaining 20% when doing data science for good.

The plague here is one of education. …

3. Communication is more important than technology

We must foster environments in which people can speak openly, honestly, and without judgment. We must be constantly curious about each other.

At the conclusion of one of our recent DataKind events, one of our partner nonprofit organizations lined up to hear the results from their volunteer team of data scientists. Everyone was all smiles — the nonprofit leaders had loved the project experience, the data scientists were excited with their results. The presentations began. “We used Amazon RedShift to store the data, which allowed us to quickly build a multinomial regression. The p-value of 0.002 shows …” Eyes glazed over. The nonprofit leaders furrowed their brows in telegraphed concentration. The jargon was standing in the way of understanding the true utility of the project’s findings. It was clear that, like so many other well-intentioned efforts, the project was at risk of gathering dust on a shelf if the team of volunteers couldn’t help the organization understand what they had learned and how it could be integrated into the organization’s ongoing work…..

4. We need diverse viewpoints

To tackle sector-wide challenges, we need a range of voices involved.

One of the most challenging aspects to making change at the sector level is the range of diverse viewpoints necessary to understand a problem in its entirety. In the business world, profit, revenue, or output can be valid metrics of success. Rarely, if ever, are metrics for social change so cleanly defined….

Challenging this paradigm requires diverse, or “collective impact,” approaches to problem solving. The idea has been around for a while (h/t Chris Diehl), but has not yet been widely implemented due to the challenges in successful collective impact. Moreover, while there are many diverse collectives committed to social change, few have the voice of expert data scientists involved. DataKind is piloting a collective impact model called DataKind Labs, that seeks to bring together diverse problem holders, data holders, and data science experts to co-create solutions that can be applied across an entire sector-wide challenge. We just launchedour first project with Microsoft to increase traffic safety and are hopeful that this effort will demonstrate how vital a role data science can play in a collective impact approach.

5. We must design for people

Data is not truth, and tech is not an answer in-and-of-itself. Without designing for the humans on the other end, our work is in vain.

So many of the data projects making headlines — a new app for finding public services, a new probabilistic model for predicting weather patterns for subsistence farmers, a visualization of government spending — are great and interesting accomplishments, but don’t seem to have an end user in mind. The current approach appears to be “get the tech geeks to hack on this problem, and we’ll have cool new solutions!” I’ve opined that, though there are many benefits to hackathons, you can’t just hack your way to social change….(More)”

Data-Driven Innovation: Big Data for Growth and Well-Being


“A new OECD report on data-driven innovation finds that countries could be getting much more out of data analytics in terms of economic and social gains if governments did more to encourage investment in “Big Data” and promote data sharing and reuse.

The migration of economic and social activities to the Internet and the advent of The Internet of Things – along with dramatically lower costs of data collection, storage and processing and rising computing power – means that data-analytics is increasingly driving innovation and is potentially an important new source of growth.

The report suggest countries act to seize these benefits, by training more and better data scientists, reducing barriers to cross-border data flows, and encouraging investment in business processes to incorporate data analytics.

Few companies outside of the ICT sector are changing internal procedures to take advantage of data. For example, data gathered by companies’ marketing departments is not always used by other departments to drive decisions and innovation. And in particular, small and medium-sized companies face barriers to the adoption of data-related technologies such as cloud computing, partly because they have difficulty implementing organisational change due to limited resources, including the shortage of skilled personnel.

At the same time, governments will need to anticipate and address the disruptive effects of big data on the economy and overall well-being, as issues as broad as privacy, jobs, intellectual property rights, competition and taxation will be impacted. Read the Policy Brief

TABLE OF CONTENTS
Preface
Foreword
Executive summary
The phenomenon of data-driven innovation
Mapping the global data ecosystem and its points of control
How data now drive innovation
Drawing value from data as an infrastructure
Building trust for data-driven innovation
Skills and employment in a data-driven economy
Promoting data-driven scientific research
The evolution of health care in a data-rich environment
Cities as hubs for data-driven innovation
Governments leading by example with public sector data

 

Health Data Governance: Privacy, Monitoring and Research


OECD publishing: “All countries are investing in health data, however; there are significant cross-country differences in data availability and use. Some countries stand out for their innovative practices enabling privacy-protective respectful data use; while others are falling behind with insufficient data and restrictions that limit access to and use of data, even by government itself. Countries that develop a data governance framework that enables privacy-protective data use will not only have the information needed to promote quality, efficiency and performance in their health systems, they will become a more attractive centre for medical research. After examining the current situation in OECD countries, a multi-disciplinary advisory panel of experts identified eight key data governance mechanisms to maximise benefits to patients and to societies from the collection, linkage and analysis of health data and to, at the same time, minimise risks to the privacy of patients and to the security of health data. These mechanisms include coordinated developming of high-value, privacy-protective health information systems; legislation that permits privacy-protective data use; open and transparent public communication ; accreditation or certification of health data processors; transparent and fair project approval processes; data de-identification and data security practices that meet legal requirements and public expectations without compromising data utility; and a process to continually assess and renew the data governance framework as new data and new risks emerge…”

.

Harnessing the Data Revolution for Sustainable Development


US State Department Fact Sheet on “U.S. Government Commitments and Collaboration with the Global Partnership for Sustainable Development Data”: “On September 27, 2015, the member states of the United Nations agreed to a set of Sustainable Development Goals (Global Goals) that define a common agenda to achieve inclusive growth, end poverty, and protect the environment by 2030. The Global Goals build on tremendous development gains made over the past decade, particularly in low- and middle-income countries, and set actionable steps with measureable indicators to drive progress. The availability and use of high quality data is essential to measuring and achieving the Global Goals. By harnessing the power of technology, mobilizing new and open data sources, and partnering across sectors, we will achieve these goals faster and make their progress more transparent.

Harnessing the data revolution is a critical enabler of the global goals—not only to monitor progress, but also to inclusively engage stakeholders at all levels – local, regional, national, global—to advance evidence-based policies and programs to reach those who need it most. Data can show us where girls are at greatest risk of violence so we can better prevent it; where forests are being destroyed in real-time so we can protect them; and where HIV/AIDS is enduring so we can focus our efforts and finish the fight. Data can catalyze private investment; build modern and inclusive economies; and support transparent and effective investment of resources for social good…..

The Global Partnership for Sustainable Development Data (Global Data Partnership), launched on the sidelines of the 70th United Nations General Assembly, is mobilizing a range of data producers and users—including governments, companies, civil society, data scientists, and international organizations—to harness the data revolution to achieve and measure the Global Goals. Working together, signatories to the Global Data Partnership will address the barriers to accessing and using development data, delivering outcomes that no single stakeholder can achieve working alone….The United States, through the U.S. President’s Emergency Plan for AIDS Relief (PEPFAR), is joining a consortium of funders to seed this initiative. The U.S. Government has many initiatives that are harnessing the data revolution for impact domestically and internationally. Highlights of our international efforts are found below:

Health and Gender

Country Data Collaboratives for Local Impact – PEPFAR and the Millennium Challenge Corporation(MCC) are partnering to invest $21.8 million in Country Data Collaboratives for Local Impact in sub-Saharan Africa that will use data on HIV/AIDS, global health, gender equality, and economic growth to improve programs and policies. Initially, the Country Data Collaboratives will align with and support the objectives of DREAMS, a PEPFAR, Bill & Melinda Gates Foundation, and Girl Effect partnership to reduce new HIV infections among adolescent girls and young women in high-burden areas.

Measurement and Accountability for Results in Health (MA4Health) Collaborative – USAID is partnering with the World Health Organization, the World Bank, and over 20 other agencies, countries, and civil society organizations to establish the MA4Health Collaborative, a multi-stakeholder partnership focused on reducing fragmentation and better aligning support to country health-system performance and accountability. The Collaborative will provide a vehicle to strengthen country-led health information platforms and accountability systems by improving data and increasing capacity for better decision-making; facilitating greater technical collaboration and joint investments; and developing international standards and tools for better information and accountability. In September 2015, partners agreed to a set of common strategic and operational principles, including a strong focus on 3–4 pathfinder countries where all partners will initially come together to support country-led monitoring and accountability platforms. Global actions will focus on promoting open data, establishing common norms and standards, and monitoring progress on data and accountability for the Global Goals. A more detailed operational plan will be developed through the end of the year, and implementation will start on January 1, 2016.

Data2X: Closing the Gender GapData2X is a platform for partners to work together to identify innovative sources of data, including “big data,” that can provide an evidence base to guide development policy and investment on gender data. As part of its commitment to Data2X—an initiative of the United Nations Foundation, Hewlett Foundation, Clinton Foundation, and Bill & Melinda Gates Foundation—PEPFAR and the Millennium Challenge Corporation (MCC) are working with partners to sponsor an open data challenge to incentivize the use of gender data to improve gender policy and practice….(More)”

See also: Data matters: the Global Partnership for Sustainable Development Data. Speech by UK International Development Secretary Justine Greening at the launch of the Global Partnership for Sustainable Development Data.

Researchers wrestle with a privacy problem


Erika Check Hayden at Nature: “The data contained in tax returns, health and welfare records could be a gold mine for scientists — but only if they can protect people’s identities….In 2011, six US economists tackled a question at the heart of education policy: how much does great teaching help children in the long run?

They started with the records of more than 11,500 Tennessee schoolchildren who, as part of an experiment in the 1980s, had been randomly assigned to high- and average-quality teachers between the ages of five and eight. Then they gauged the children’s earnings as adults from federal tax returns filed in the 2000s. The analysis showed that the benefits of a good early education last for decades: each year of better teaching in childhood boosted an individual’s annual earnings by some 3.5% on average. Other data showed the same individuals besting their peers on measures such as university attendance, retirement savings, marriage rates and home ownership.

The economists’ work was widely hailed in education-policy circles, and US President Barack Obama cited it in his 2012 State of the Union address when he called for more investment in teacher training.

But for many social scientists, the most impressive thing was that the authors had been able to examine US federal tax returns: a closely guarded data set that was then available to researchers only with tight restrictions. This has made the study an emblem for both the challenges and the enormous potential power of ‘administrative data’ — information collected during routine provision of services, including tax returns, records of welfare benefits, data on visits to doctors and hospitals, and criminal records. Unlike Internet searches, social-media posts and the rest of the digital trails that people establish in their daily lives, administrative data cover entire populations with minimal self-selection effects: in the US census, for example, everyone sampled is required by law to respond and tell the truth.

This puts administrative data sets at the frontier of social science, says John Friedman, an economist at Brown University in Providence, Rhode Island, and one of the lead authors of the education study “They allow researchers to not just get at old questions in a new way,” he says, “but to come at problems that were completely impossible before.”….

But there is also concern that the rush to use these data could pose new threats to citizens’ privacy. “The types of protections that we’re used to thinking about have been based on the twin pillars of anonymity and informed consent, and neither of those hold in this new world,” says Julia Lane, an economist at New York University. In 2013, for instance, researchers showed that they could uncover the identities of supposedly anonymous participants in a genetic study simply by cross-referencing their data with publicly available genealogical information.

Many people are looking for ways to address these concerns without inhibiting research. Suggested solutions include policy measures, such as an international code of conduct for data privacy, and technical methods that allow the use of the data while protecting privacy. Crucially, notes Lane, although preserving privacy sometimes complicates researchers’ lives, it is necessary to uphold the public trust that makes the work possible.

“Difficulty in access is a feature, not a bug,” she says. “It should be hard to get access to data, but it’s very important that such access be made possible.” Many nations collect administrative data on a massive scale, but only a few, notably in northern Europe, have so far made it easy for researchers to use those data.

In Denmark, for instance, every newborn child is assigned a unique identification number that tracks his or her lifelong interactions with the country’s free health-care system and almost every other government service. In 2002, researchers used data gathered through this identification system to retrospectively analyse the vaccination and health status of almost every child born in the country from 1991 to 1998 — 537,000 in all. At the time, it was the largest study ever to disprove the now-debunked link between measles vaccination and autism.

Other countries have begun to catch up. In 2012, for instance, Britain launched the unified UK Data Service to facilitate research access to data from the country’s census and other surveys. A year later, the service added a new Administrative Data Research Network, which has centres in England, Scotland, Northern Ireland and Wales to provide secure environments for researchers to access anonymized administrative data.

In the United States, the Census Bureau has been expanding its network of Research Data Centers, which currently includes 19 sites around the country at which researchers with the appropriate permissions can access confidential data from the bureau itself, as well as from other agencies. “We’re trying to explore all the available ways that we can expand access to these rich data sets,” says Ron Jarmin, the bureau’s assistant director for research and methodology.

In January, a group of federal agencies, foundations and universities created the Institute for Research on Innovation and Science at the University of Michigan in Ann Arbor to combine university and government data and measure the impact of research spending on economic outcomes. And in July, the US House of Representatives passed a bipartisan bill to study whether the federal government should provide a central clearing house of statistical administrative data.

Yet vast swathes of administrative data are still inaccessible, says George Alter, director of the Inter-university Consortium for Political and Social Research based at the University of Michigan, which serves as a data repository for approximately 760 institutions. “Health systems, social-welfare systems, financial transactions, business records — those things are just not available in most cases because of privacy concerns,” says Alter. “This is a big drag on research.”…

Many researchers argue, however, that there are legitimate scientific uses for such data. Jarmin says that the Census Bureau is exploring the use of data from credit-card companies to monitor economic activity. And researchers funded by the US National Science Foundation are studying how to use public Twitter posts to keep track of trends in phenomena such as unemployment.

 

….Computer scientists and cryptographers are experimenting with technological solutions. One, called differential privacy, adds a small amount of distortion to a data set, so that querying the data gives a roughly accurate result without revealing the identity of the individuals involved. The US Census Bureau uses this approach for its OnTheMap project, which tracks workers’ daily commutes. ….In any case, although synthetic data potentially solve the privacy problem, there are some research applications that cannot tolerate any noise in the data. A good example is the work showing the effect of neighbourhood on earning potential3, which was carried out by Raj Chetty, an economist at Harvard University in Cambridge, Massachusetts. Chetty needed to track specific individuals to show that the areas in which children live their early lives correlate with their ability to earn more or less than their parents. In subsequent studies5, Chetty and his colleagues showed that moving children from resource-poor to resource-rich neighbourhoods can boost their earnings in adulthood, proving a causal link.

Secure multiparty computation is a technique that attempts to address this issue by allowing multiple data holders to analyse parts of the total data set, without revealing the underlying data to each other. Only the results of the analyses are shared….(More)”