Researchers wrestle with a privacy problem


Erika Check Hayden at Nature: “The data contained in tax returns, health and welfare records could be a gold mine for scientists — but only if they can protect people’s identities….In 2011, six US economists tackled a question at the heart of education policy: how much does great teaching help children in the long run?

They started with the records of more than 11,500 Tennessee schoolchildren who, as part of an experiment in the 1980s, had been randomly assigned to high- and average-quality teachers between the ages of five and eight. Then they gauged the children’s earnings as adults from federal tax returns filed in the 2000s. The analysis showed that the benefits of a good early education last for decades: each year of better teaching in childhood boosted an individual’s annual earnings by some 3.5% on average. Other data showed the same individuals besting their peers on measures such as university attendance, retirement savings, marriage rates and home ownership.

The economists’ work was widely hailed in education-policy circles, and US President Barack Obama cited it in his 2012 State of the Union address when he called for more investment in teacher training.

But for many social scientists, the most impressive thing was that the authors had been able to examine US federal tax returns: a closely guarded data set that was then available to researchers only with tight restrictions. This has made the study an emblem for both the challenges and the enormous potential power of ‘administrative data’ — information collected during routine provision of services, including tax returns, records of welfare benefits, data on visits to doctors and hospitals, and criminal records. Unlike Internet searches, social-media posts and the rest of the digital trails that people establish in their daily lives, administrative data cover entire populations with minimal self-selection effects: in the US census, for example, everyone sampled is required by law to respond and tell the truth.

This puts administrative data sets at the frontier of social science, says John Friedman, an economist at Brown University in Providence, Rhode Island, and one of the lead authors of the education study “They allow researchers to not just get at old questions in a new way,” he says, “but to come at problems that were completely impossible before.”….

But there is also concern that the rush to use these data could pose new threats to citizens’ privacy. “The types of protections that we’re used to thinking about have been based on the twin pillars of anonymity and informed consent, and neither of those hold in this new world,” says Julia Lane, an economist at New York University. In 2013, for instance, researchers showed that they could uncover the identities of supposedly anonymous participants in a genetic study simply by cross-referencing their data with publicly available genealogical information.

Many people are looking for ways to address these concerns without inhibiting research. Suggested solutions include policy measures, such as an international code of conduct for data privacy, and technical methods that allow the use of the data while protecting privacy. Crucially, notes Lane, although preserving privacy sometimes complicates researchers’ lives, it is necessary to uphold the public trust that makes the work possible.

“Difficulty in access is a feature, not a bug,” she says. “It should be hard to get access to data, but it’s very important that such access be made possible.” Many nations collect administrative data on a massive scale, but only a few, notably in northern Europe, have so far made it easy for researchers to use those data.

In Denmark, for instance, every newborn child is assigned a unique identification number that tracks his or her lifelong interactions with the country’s free health-care system and almost every other government service. In 2002, researchers used data gathered through this identification system to retrospectively analyse the vaccination and health status of almost every child born in the country from 1991 to 1998 — 537,000 in all. At the time, it was the largest study ever to disprove the now-debunked link between measles vaccination and autism.

Other countries have begun to catch up. In 2012, for instance, Britain launched the unified UK Data Service to facilitate research access to data from the country’s census and other surveys. A year later, the service added a new Administrative Data Research Network, which has centres in England, Scotland, Northern Ireland and Wales to provide secure environments for researchers to access anonymized administrative data.

In the United States, the Census Bureau has been expanding its network of Research Data Centers, which currently includes 19 sites around the country at which researchers with the appropriate permissions can access confidential data from the bureau itself, as well as from other agencies. “We’re trying to explore all the available ways that we can expand access to these rich data sets,” says Ron Jarmin, the bureau’s assistant director for research and methodology.

In January, a group of federal agencies, foundations and universities created the Institute for Research on Innovation and Science at the University of Michigan in Ann Arbor to combine university and government data and measure the impact of research spending on economic outcomes. And in July, the US House of Representatives passed a bipartisan bill to study whether the federal government should provide a central clearing house of statistical administrative data.

Yet vast swathes of administrative data are still inaccessible, says George Alter, director of the Inter-university Consortium for Political and Social Research based at the University of Michigan, which serves as a data repository for approximately 760 institutions. “Health systems, social-welfare systems, financial transactions, business records — those things are just not available in most cases because of privacy concerns,” says Alter. “This is a big drag on research.”…

Many researchers argue, however, that there are legitimate scientific uses for such data. Jarmin says that the Census Bureau is exploring the use of data from credit-card companies to monitor economic activity. And researchers funded by the US National Science Foundation are studying how to use public Twitter posts to keep track of trends in phenomena such as unemployment.

 

….Computer scientists and cryptographers are experimenting with technological solutions. One, called differential privacy, adds a small amount of distortion to a data set, so that querying the data gives a roughly accurate result without revealing the identity of the individuals involved. The US Census Bureau uses this approach for its OnTheMap project, which tracks workers’ daily commutes. ….In any case, although synthetic data potentially solve the privacy problem, there are some research applications that cannot tolerate any noise in the data. A good example is the work showing the effect of neighbourhood on earning potential3, which was carried out by Raj Chetty, an economist at Harvard University in Cambridge, Massachusetts. Chetty needed to track specific individuals to show that the areas in which children live their early lives correlate with their ability to earn more or less than their parents. In subsequent studies5, Chetty and his colleagues showed that moving children from resource-poor to resource-rich neighbourhoods can boost their earnings in adulthood, proving a causal link.

Secure multiparty computation is a technique that attempts to address this issue by allowing multiple data holders to analyse parts of the total data set, without revealing the underlying data to each other. Only the results of the analyses are shared….(More)”

Personalising data for development


Wolfgang Fengler and Homi Kharas in the Financial Times: “When world leaders meet this week for the UN’s general assembly to adopt the Sustainable Development Goals (SDGs), they will also call for a “data revolution”. In a world where almost everyone will soon have access to a mobile phone, where satellites will take high-definition pictures of the whole planet every three days, and where inputs from sensors and social media make up two thirds of the world’s new data, the opportunities to leverage this power for poverty reduction and sustainable development are enormous. We are also on the verge of major improvements in government administrative data and data gleaned from the activities of private companies and citizens, in big and small data sets.

But these opportunities are yet to materialize in any scale. In fact, despite the exponential growth in connectivity and the emergence of big data, policy making is rarely based on good data. Almost every report from development institutions starts with a disclaimer highlighting “severe data limitations”. Like castaways on an island, surrounded with water they cannot drink unless the salt is removed, today’s policy makers are in a sea of data that need to be refined and treated (simplified and aggregated) to make them “consumable”.

To make sense of big data, we used to depend on data scientists, computer engineers and mathematicians who would process requests one by one. But today, new programs and analytical solutions are putting big data at anyone’s fingertips. Tomorrow, it won’t be technical experts driving the data revolution but anyone operating a smartphone. Big data will become personal. We will be able to monitor and model social and economic developments faster, more reliably, more cheaply and on a far more granular scale. The data revolution will affect both the harvesting of data through new collection methods, and the processing of data through new aggregation and communication tools.

In practice, this means that data will become more actionable by becoming more personal, more timely and more understandable. Today, producing a poverty assessment and poverty map takes at least a year: it involves hundreds of enumerators, lengthy interviews and laborious data entry. In the future, thanks to hand-held connected devices, data collection and aggregation will happen in just a few weeks. Many more instances come to mind where new and higher-frequency data could generate development breakthroughs: monitoring teacher attendance, stocks and quality of pharmaceuticals, or environmental damage, for example…..

Despite vast opportunities, there are very few examples that have generated sufficient traction and scale to change policy and behaviour and create the feedback loops to further improve data quality. Two tools have personalised the abstract subjects of environmental degradation and demography (see table):

  • Monitoring forest fires. The World Resources Institute has launched Global Forest Watch, which enables users to monitor forest fires in near real time, and overlay relevant spatial information such as property boundaries and ownership data to be developed into a model to anticipate the impact on air quality in affected areas in Indonesia, Singapore and Malaysia.
  • Predicting your own life expectancy. The World Population Program developed a predictive tool – www.population.io – showing each person’s place in the distribution of world population and corresponding statistical life expectancy. In just a few months, this prototype attracted some 2m users who shared their results more than 25,000 times on social media. The traction of the tool resulted from making demography personal and converting an abstract subject matter into a question of individual ranking and life expectancy.

A new Global Partnership for Sustainable Development Data will be launched at the time of the UN General Assembly….(More)”

Open Science Revolution – New Ways of Publishing Research in The Digital Age


Scicasts: “A massive increase in the power of digital technology over the past decade allows us today to publish any article, blog post or tweet in a matter of seconds.

Much of the information on the web is also free – newspapers are embracing open access to their articles and many websites are copyrighting their content under the Creative Commons licenses, most of which allow the re-use and sharing of the original work at no cost.

As opposed to this openness, science publishing is still lagging behind. Most of the scientific knowledge generated in the past two centuries is hidden behind a paywall, requiring an average reader to pay tens to hundreds of euros to access an original study report written by scientists.

Can we not do things differently?

An answer to this question led to the creation of a number of new concepts that emerged over the past few years. A range of innovative open online science platforms are now trying “to do things differently”, offering researchers alternative ways of publishing their discoveries, making the publishing process faster and more transparent.

Here is a handful of examples, implemented by three companies – a recently launched open access journal Research Ideas and Outcomes (RIO), an open publishing platform F1000Research from The Faculty of 1000 and a research and publishing network ScienceOpen. Each has something different to offer, yet all of them seem to agree that science research should be open and accessible to everyone.

New concept – publish all research outputs

While the two-centuries-old tradition of science publishing lives and dies on exposing only the final outcomes of a research project, the RIO journal suggests a different approach. If we can follow new stories online step by step as they unfold (something that journalists have figured out and use in live reporting), they say, why not apply similar principles to research projects?

“RIO is the first journal that aims at publishing the whole research cycle and definitely the first one, to my knowledge, that tries to do that across all science branches – all of humanities, social sciences, engineering and so on,” says a co-founder of the RIO journal, Prof. Lyubomir Penev, in an interview to Scicasts.

From the original project outline, to datasets, software and methodology, each part of the project can be published separately. “The writing platform ARPHA, which underpins RIO, handles the whole workflow – from the stage when you write the first letter, to the end,” explains Prof. Penev.

At an early stage, the writing process is closed from public view and researchers may invite their collaborators and peers to view their project, add data and contribute to its development. Scientists can choose to publish any part of their project as it progresses – they can submit to the open platform their research idea, hypothesis or a newly developed experimental protocol, alongside future datasets and whole final manuscripts.

Some intermediate research stages and preliminary results can also be submitted to the platform F1000Research, which developed their own online authoring tool F1000Workspace, similar to ARPHA….(More)”

The Curious Politics of the ‘Nudge’


How do we really feel about policy “nudges”?

Earlier this month, President Obama signed an executive order directing federal agencies to collaborate with the White House’s new Social and Behavioral Sciences Team to use insights from behavioral science research to better serve the American people. For instance, studies show that people are more likely to save for retirement when they are automatically enrolled into a 401(k) retirement saving plan that they can opt out of than when they must actively opt in. The idea behind Mr. Obama’s initiative is that such soft-touch interventions, or “nudges,” can facilitate better decisions without resorting to heavier-handed strategies like mandates, taxes and bans.

The response to the executive order has been generally positive, but some conservatives have been critical, characterizing it as an instance of government overreach. (“President Obama Orders Behavioral Experiments on American Public” ran a headline on the website The Daily Caller.) However, it is worth noting that when a similar “behavioral insights team” was founded by the conservative government of the British prime minister, David Cameron, it met resistance from the political left. (“Brits’ Minds Will Be Controlled Without Us Knowing It” ran a headline in The Guardian.)

Is it possible that partisans from both ends of the political spectrum conflate their feelings about a general-purpose policymethod (such as nudges) with their feelings about a specific policy goal (or about those who endorse that goal)? We think so. In a series of recent experiments that we conducted with Todd Rogers of the Harvard Kennedy School, we found evidence for a “partisan nudge bias.”…

we also found that when behavioral policy tools were described without mention of a specific policy application or sponsor, the bias disappeared. In this “blind taste test,” liberals and conservatives were roughly equally accepting of the use of policy nudges.

This last finding is good news, because scientifically grounded, empirically validated behavioral innovations can help policy makers improve government initiatives for the benefit of all Americans, regardless of their political inclinations. “(More)

Can Open Data Drive Innovative Healthcare?


Will Greene at Huffington Post: “As healthcare systems worldwide become increasingly digitized, medical scientists and health researchers have more data than ever. Yet much valuable health information remains locked in proprietary or hidden databases. A growing number of open data initiatives aim to change this, but it won’t be easy….

To overcome these challenges, a growing array of stakeholders — including healthcare and tech companies, research institutions, NGOs, universities, governments, patient groups, and individuals — are banding together to develop new regulations and guidelines, and generally promote open data in healthcare.

Some of these initiatives focus on improving transparency in clinical trials. Among those pushing for researchers to share more clinical trials data are groups like AllTrials and the Yale Open Data Access (YODA) Project, donor organizations like the Gates Foundation, and biomedical journals like The BMJ. Private healthcare companies, including some that resisted data sharing in the past, are increasingly seeing value in open collaboration as well.

Other initiatives focus on empowering patients to share their own health data. Consumer genomics companies, personal health records providers, disease management apps, online patient communities and other healthcare services give patients greater access to personal health data than ever before. Some also allow consumers to share it with researchers, enroll in clinical trials, or find other ways to leverage it for the benefit of others.

Another group of initiatives seek to improve the quality and availability of public health data, such as that pertaining to epidemiological trends, health financing, and human behavior.

Governments often play a key role in collecting this kind of data, but some are more open and effective than others. “Open government is about more than a mere commitment to share data,” says Peter Speyer, Chief Data and Technology Officer at the Institute for Health Metrics and Evaluation (IHME), a health research center at the University of Washington. “It’s also about supporting a whole ecosystem for using these data and tapping into creativity and resources that are not available within any single organization.”

Open data may be particularly important in managing infectious disease outbreaks and other public health emergencies. Following the recent Ebola crisis, the World Health Organization issued a statement on the need for rapid data sharing in emergency situations. It laid out guidelines that could help save lives when the next pandemic strikes.

But on its own, open data does not lead to healthcare innovation. “Simply making large amounts of data accessible is good for transparency and trust,” says Craig Lipset, Head of Clinical Innovation at Pfizer, “but it does not inherently improve R&D or health research. We still need important collaborations and partnerships that make full use of these vast data stores.”

Many such collaborations and partnerships are already underway. They may help drive a new era of healthcare innovation ..(More)”

Democracy


New graphic novel by Alecos Papadatos and Annie DiDonna: “Democracy opens in 490 B.C., with Athens at war. The hero of the story, Leander, is trying to rouse his comrades for the morrow’s battle against a far mightier enemy, and begins to recount his own life, having borne direct witness to the evils of the old tyrannical regimes and to the emergence of a new political system. The tale that emerges is one of daring, danger, and big ideas, of the death of the gods and the tortuous birth of democracy. We see that democracy originated through a combination of chance and historical contingency–but also through the cunning, courage, and willful action of a group of remarkably talented and driven individuals….also offers fresh insight into how this greatest of civic inventions came to be. (More)”

DIY ‘Public Service Design’ manual


The Spider Project: “Service design is a method for inventing or improving services. It is an interdisciplinary method that makes use of ‘design thinking’. Service design helps with designing services from the perspective of the user.

Not by guessing what these users might want, but by truly co-creating relevant, effective and efficient services in collaboration with them. The basic principles of service design are that the designed service should be user- friendly and desired, and must respond to the needs and motivations of customers and citizens.

This manual guides civil servants in tendering, evaluating and managing, and shows the added value of design professionals when bringing their skills, knowledge and experience to the table.

This practical guide is filled with examples and case studies that will enable public organisations to obtain enough insights and confidence in service design in order to start working with it themselves.

Download a copy of Public Service Design

Opening City Hall’s Wallets to Innovation


Tina Rosenberg at the New York Times: “Six years ago, the city of San Francisco decided to upgrade its streetlights. This is its story: O.K., stop. This is a parody, right? Government procurement is surely too nerdy even for Fixes. Procurement is a clerical task that cities do on autopilot: Decide what you need. Write a mind-numbing couple of dozen pages of specifications. Collect a few bids from the usual suspects. Yep, that’s procurement.But it doesn’t have to be. Instead of a rote purchasing exercise, what if procurement could be a way for cities to find new approaches to their problems?….

“Instead of saying to the marketplace ‘here’s the solution we want,’ we said ‘here’s the challenge, here’s the problem we’re having’,” said Barbara Hale, assistant general manager of the city’s Public Utilities Commission. “That opened us up to what other people thought the solution to the problem was, rather than us in our own little world deciding we knew the answer.”

The city got 59 different ideas from businesses in numerous countries. A Swiss company called Paradox won an agreement to do a 12-streetlight pilot test.

So — a happy ending for the scrappy and innovative Paradox? No. Paradox’s system worked, but the city could not award a contract for 18,500 streetlights that way. It held another competition for just the control systems, and tried out three of them. Last year the city issued a traditional R.F.P., using what it learned from the pilots. The contract has not yet been awarded.

Dozens of cities around the world are using problem-based procurement.   Barcelona has posed six challenges that it will spend a million euros on, and Moscow announced last year that five percent of city spending would be set aside for innovative procurement. But in the vast majority of cities, as in San Francisco, problem-based procurement is still just for small pilot projects — a novelty.

It will grow, however. This is largely because of the efforts ofCityMart, a company based in New York and Barcelona that has almost single-handedly taken the concept from a neat idea to something cities all over want to figure out how to do.

The concept is new enough that there’s not yet a lot of evidence about its effects. There’s plenty of proof, however, of the deficiencies of business-as-usual.

With the typical R.F.P., a city uses a consultant, working with local officials, to design what to ask for. Then city engineers and lawyers write the specifications, and the R.F.P. goes out for bids.

“If it’s a road safety issue it’s likely it will be the traffic engineers who will be asked to tell you what you can do, what you should invest in,” said Sascha Haselmayer, CityMart’s chief executive. “They tend to come up with things like traffic lights. They do not know there’s a world of entrepreneurs who work on educating drivers better, or that have a different design approach to public space — things that may not fit into the professional profile of the consultant.”

Such a process is guaranteed to be innovation-free. Innovation is far more likely when expertise from one discipline is applied to another. If you want the most creative solution to a traffic problem, ask people who aren’t traffic engineers.

The R.F.P. process itself was designed to give anyone a shot at a contract, but in reality, the winners almost always come from a small group of businesses with the required financial stability, legal know-how to negotiate the bureaucracy, and connections. Put those together, and cities get to consider only a tiny spectrum of the possible solutions to their problems.

Problem-based procurement can provide them with a whole rainbow. But to do that, the process needs clearinghouses — eBays or Craigslists for urban ideas….(More)”

Ethical, Safe, and Effective Digital Data Use in Civil Society


Blog by Lucy Bernholz, Rob Reich, Emma Saunders-Hastings, and Emma Leeds Armstrong: “How do we use digital data ethically, safely, and effectively in civil society. We have developed three early principles for consideration:

  • Default to person-centered consent.
  • Prioritize privacy and minimum viable data collection.
  • Plan from the beginning to open (share) your work.

This post provides a synthesis from a one day workshop that informed these principles. It concludes with links to draft guidelines you can use to inform partnerships between data consultants/volunteers and nonprofit organizations….(More)

These three values — consent, minimum viable data collection, and open sharing- comprise a basic framework for ethical, safe, and effective use of digital data by civil society organizations. They should be integrated into partnerships with data intermediaries and, perhaps, into general data practices in civil society.

We developed two tools to guide conversations between data volunteers and/or consultants and nonprofits. These are downloadable below. Please use them, share them, improve them, and share them again….

  1. Checklist for NGOs and external data consultants
  2. Guidelines for NGOs and external data consultants (More)”

Smoke Signals: Open data & analytics for preventing fire deaths


Enigma: “Today we are launching Smoke Signals, an open source civic analytics tool that helps local communities determine which city blocks are at the highest risk of not having a smoke alarm.

25,000 people are killed or injured in 1 million fires across the United States each year. With over 130 million housing units across the country, 4.5 million of them do not have smoke detectors, placing their inhabitants at substantial risk. Driving this number down is the single most important factor for saving lives put at risk by fire.

Organizations like the Red Cross are investing a lot of resources to buy and install smoke alarms in people’s homes. But a big challenge remains: in a city of millions, what doors should you knock on first when conducting an outreach effort?

We began working on the problem of targeting the blocks at highest risk of not having a smoke alarm with the City of New Orleans last spring. (You can read about this work here.) Over the past few months, with collaboration from the Red Cross and DataKind, we’ve built out a generalized model and a set of tools to offer the same analytics potential to 178 American cities, all in a way that is simple to use and sensitive to how on-the-ground operations are organized.

We believe that Smoke Signals is more a collection of tools and collaborations than it is a slick piece of software that can somehow act as a panacea to the problem of fire fatalities. Core to its purpose and mission are a set of commitments:

  • an ongoing collaboration with the Red Cross wherein our smoke alarm work informs their on-the-ground outreach
  • a collaboration with DataKind to continue applying volunteer work to the improvement of the underlying models and data that drive the risk analysis
  • a working relationship with major American cities to help integrate our prediction models into their outreach programs

and tools:

  • a downloadable CSV for 178 American municipalities that associate city streets to risk scores
  • an interactive map for an immediate bird’s eye assessment of at-risk city blocks
  • an API endpoint to which users can upload a CSV of local fire incidents in order to improve scores for their area

We believe this is an important contribution to public safety and the better delivery of government services. However, we also consider it a work in progress, a demonstration of how civic analytic solutions can be shared and generalized across the country. We are open sourcing all of the components that went into it and invite anyone with an interest in making it better to get involved….(More)”