Researchers wrestle with a privacy problem


Erika Check Hayden at Nature: “The data contained in tax returns, health and welfare records could be a gold mine for scientists — but only if they can protect people’s identities….In 2011, six US economists tackled a question at the heart of education policy: how much does great teaching help children in the long run?

They started with the records of more than 11,500 Tennessee schoolchildren who, as part of an experiment in the 1980s, had been randomly assigned to high- and average-quality teachers between the ages of five and eight. Then they gauged the children’s earnings as adults from federal tax returns filed in the 2000s. The analysis showed that the benefits of a good early education last for decades: each year of better teaching in childhood boosted an individual’s annual earnings by some 3.5% on average. Other data showed the same individuals besting their peers on measures such as university attendance, retirement savings, marriage rates and home ownership.

The economists’ work was widely hailed in education-policy circles, and US President Barack Obama cited it in his 2012 State of the Union address when he called for more investment in teacher training.

But for many social scientists, the most impressive thing was that the authors had been able to examine US federal tax returns: a closely guarded data set that was then available to researchers only with tight restrictions. This has made the study an emblem for both the challenges and the enormous potential power of ‘administrative data’ — information collected during routine provision of services, including tax returns, records of welfare benefits, data on visits to doctors and hospitals, and criminal records. Unlike Internet searches, social-media posts and the rest of the digital trails that people establish in their daily lives, administrative data cover entire populations with minimal self-selection effects: in the US census, for example, everyone sampled is required by law to respond and tell the truth.

This puts administrative data sets at the frontier of social science, says John Friedman, an economist at Brown University in Providence, Rhode Island, and one of the lead authors of the education study “They allow researchers to not just get at old questions in a new way,” he says, “but to come at problems that were completely impossible before.”….

But there is also concern that the rush to use these data could pose new threats to citizens’ privacy. “The types of protections that we’re used to thinking about have been based on the twin pillars of anonymity and informed consent, and neither of those hold in this new world,” says Julia Lane, an economist at New York University. In 2013, for instance, researchers showed that they could uncover the identities of supposedly anonymous participants in a genetic study simply by cross-referencing their data with publicly available genealogical information.

Many people are looking for ways to address these concerns without inhibiting research. Suggested solutions include policy measures, such as an international code of conduct for data privacy, and technical methods that allow the use of the data while protecting privacy. Crucially, notes Lane, although preserving privacy sometimes complicates researchers’ lives, it is necessary to uphold the public trust that makes the work possible.

“Difficulty in access is a feature, not a bug,” she says. “It should be hard to get access to data, but it’s very important that such access be made possible.” Many nations collect administrative data on a massive scale, but only a few, notably in northern Europe, have so far made it easy for researchers to use those data.

In Denmark, for instance, every newborn child is assigned a unique identification number that tracks his or her lifelong interactions with the country’s free health-care system and almost every other government service. In 2002, researchers used data gathered through this identification system to retrospectively analyse the vaccination and health status of almost every child born in the country from 1991 to 1998 — 537,000 in all. At the time, it was the largest study ever to disprove the now-debunked link between measles vaccination and autism.

Other countries have begun to catch up. In 2012, for instance, Britain launched the unified UK Data Service to facilitate research access to data from the country’s census and other surveys. A year later, the service added a new Administrative Data Research Network, which has centres in England, Scotland, Northern Ireland and Wales to provide secure environments for researchers to access anonymized administrative data.

In the United States, the Census Bureau has been expanding its network of Research Data Centers, which currently includes 19 sites around the country at which researchers with the appropriate permissions can access confidential data from the bureau itself, as well as from other agencies. “We’re trying to explore all the available ways that we can expand access to these rich data sets,” says Ron Jarmin, the bureau’s assistant director for research and methodology.

In January, a group of federal agencies, foundations and universities created the Institute for Research on Innovation and Science at the University of Michigan in Ann Arbor to combine university and government data and measure the impact of research spending on economic outcomes. And in July, the US House of Representatives passed a bipartisan bill to study whether the federal government should provide a central clearing house of statistical administrative data.

Yet vast swathes of administrative data are still inaccessible, says George Alter, director of the Inter-university Consortium for Political and Social Research based at the University of Michigan, which serves as a data repository for approximately 760 institutions. “Health systems, social-welfare systems, financial transactions, business records — those things are just not available in most cases because of privacy concerns,” says Alter. “This is a big drag on research.”…

Many researchers argue, however, that there are legitimate scientific uses for such data. Jarmin says that the Census Bureau is exploring the use of data from credit-card companies to monitor economic activity. And researchers funded by the US National Science Foundation are studying how to use public Twitter posts to keep track of trends in phenomena such as unemployment.

 

….Computer scientists and cryptographers are experimenting with technological solutions. One, called differential privacy, adds a small amount of distortion to a data set, so that querying the data gives a roughly accurate result without revealing the identity of the individuals involved. The US Census Bureau uses this approach for its OnTheMap project, which tracks workers’ daily commutes. ….In any case, although synthetic data potentially solve the privacy problem, there are some research applications that cannot tolerate any noise in the data. A good example is the work showing the effect of neighbourhood on earning potential3, which was carried out by Raj Chetty, an economist at Harvard University in Cambridge, Massachusetts. Chetty needed to track specific individuals to show that the areas in which children live their early lives correlate with their ability to earn more or less than their parents. In subsequent studies5, Chetty and his colleagues showed that moving children from resource-poor to resource-rich neighbourhoods can boost their earnings in adulthood, proving a causal link.

Secure multiparty computation is a technique that attempts to address this issue by allowing multiple data holders to analyse parts of the total data set, without revealing the underlying data to each other. Only the results of the analyses are shared….(More)”

Can Open Data Drive Innovative Healthcare?


Will Greene at Huffington Post: “As healthcare systems worldwide become increasingly digitized, medical scientists and health researchers have more data than ever. Yet much valuable health information remains locked in proprietary or hidden databases. A growing number of open data initiatives aim to change this, but it won’t be easy….

To overcome these challenges, a growing array of stakeholders — including healthcare and tech companies, research institutions, NGOs, universities, governments, patient groups, and individuals — are banding together to develop new regulations and guidelines, and generally promote open data in healthcare.

Some of these initiatives focus on improving transparency in clinical trials. Among those pushing for researchers to share more clinical trials data are groups like AllTrials and the Yale Open Data Access (YODA) Project, donor organizations like the Gates Foundation, and biomedical journals like The BMJ. Private healthcare companies, including some that resisted data sharing in the past, are increasingly seeing value in open collaboration as well.

Other initiatives focus on empowering patients to share their own health data. Consumer genomics companies, personal health records providers, disease management apps, online patient communities and other healthcare services give patients greater access to personal health data than ever before. Some also allow consumers to share it with researchers, enroll in clinical trials, or find other ways to leverage it for the benefit of others.

Another group of initiatives seek to improve the quality and availability of public health data, such as that pertaining to epidemiological trends, health financing, and human behavior.

Governments often play a key role in collecting this kind of data, but some are more open and effective than others. “Open government is about more than a mere commitment to share data,” says Peter Speyer, Chief Data and Technology Officer at the Institute for Health Metrics and Evaluation (IHME), a health research center at the University of Washington. “It’s also about supporting a whole ecosystem for using these data and tapping into creativity and resources that are not available within any single organization.”

Open data may be particularly important in managing infectious disease outbreaks and other public health emergencies. Following the recent Ebola crisis, the World Health Organization issued a statement on the need for rapid data sharing in emergency situations. It laid out guidelines that could help save lives when the next pandemic strikes.

But on its own, open data does not lead to healthcare innovation. “Simply making large amounts of data accessible is good for transparency and trust,” says Craig Lipset, Head of Clinical Innovation at Pfizer, “but it does not inherently improve R&D or health research. We still need important collaborations and partnerships that make full use of these vast data stores.”

Many such collaborations and partnerships are already underway. They may help drive a new era of healthcare innovation ..(More)”

Opening City Hall’s Wallets to Innovation


Tina Rosenberg at the New York Times: “Six years ago, the city of San Francisco decided to upgrade its streetlights. This is its story: O.K., stop. This is a parody, right? Government procurement is surely too nerdy even for Fixes. Procurement is a clerical task that cities do on autopilot: Decide what you need. Write a mind-numbing couple of dozen pages of specifications. Collect a few bids from the usual suspects. Yep, that’s procurement.But it doesn’t have to be. Instead of a rote purchasing exercise, what if procurement could be a way for cities to find new approaches to their problems?….

“Instead of saying to the marketplace ‘here’s the solution we want,’ we said ‘here’s the challenge, here’s the problem we’re having’,” said Barbara Hale, assistant general manager of the city’s Public Utilities Commission. “That opened us up to what other people thought the solution to the problem was, rather than us in our own little world deciding we knew the answer.”

The city got 59 different ideas from businesses in numerous countries. A Swiss company called Paradox won an agreement to do a 12-streetlight pilot test.

So — a happy ending for the scrappy and innovative Paradox? No. Paradox’s system worked, but the city could not award a contract for 18,500 streetlights that way. It held another competition for just the control systems, and tried out three of them. Last year the city issued a traditional R.F.P., using what it learned from the pilots. The contract has not yet been awarded.

Dozens of cities around the world are using problem-based procurement.   Barcelona has posed six challenges that it will spend a million euros on, and Moscow announced last year that five percent of city spending would be set aside for innovative procurement. But in the vast majority of cities, as in San Francisco, problem-based procurement is still just for small pilot projects — a novelty.

It will grow, however. This is largely because of the efforts ofCityMart, a company based in New York and Barcelona that has almost single-handedly taken the concept from a neat idea to something cities all over want to figure out how to do.

The concept is new enough that there’s not yet a lot of evidence about its effects. There’s plenty of proof, however, of the deficiencies of business-as-usual.

With the typical R.F.P., a city uses a consultant, working with local officials, to design what to ask for. Then city engineers and lawyers write the specifications, and the R.F.P. goes out for bids.

“If it’s a road safety issue it’s likely it will be the traffic engineers who will be asked to tell you what you can do, what you should invest in,” said Sascha Haselmayer, CityMart’s chief executive. “They tend to come up with things like traffic lights. They do not know there’s a world of entrepreneurs who work on educating drivers better, or that have a different design approach to public space — things that may not fit into the professional profile of the consultant.”

Such a process is guaranteed to be innovation-free. Innovation is far more likely when expertise from one discipline is applied to another. If you want the most creative solution to a traffic problem, ask people who aren’t traffic engineers.

The R.F.P. process itself was designed to give anyone a shot at a contract, but in reality, the winners almost always come from a small group of businesses with the required financial stability, legal know-how to negotiate the bureaucracy, and connections. Put those together, and cities get to consider only a tiny spectrum of the possible solutions to their problems.

Problem-based procurement can provide them with a whole rainbow. But to do that, the process needs clearinghouses — eBays or Craigslists for urban ideas….(More)”

Smoke Signals: Open data & analytics for preventing fire deaths


Enigma: “Today we are launching Smoke Signals, an open source civic analytics tool that helps local communities determine which city blocks are at the highest risk of not having a smoke alarm.

25,000 people are killed or injured in 1 million fires across the United States each year. With over 130 million housing units across the country, 4.5 million of them do not have smoke detectors, placing their inhabitants at substantial risk. Driving this number down is the single most important factor for saving lives put at risk by fire.

Organizations like the Red Cross are investing a lot of resources to buy and install smoke alarms in people’s homes. But a big challenge remains: in a city of millions, what doors should you knock on first when conducting an outreach effort?

We began working on the problem of targeting the blocks at highest risk of not having a smoke alarm with the City of New Orleans last spring. (You can read about this work here.) Over the past few months, with collaboration from the Red Cross and DataKind, we’ve built out a generalized model and a set of tools to offer the same analytics potential to 178 American cities, all in a way that is simple to use and sensitive to how on-the-ground operations are organized.

We believe that Smoke Signals is more a collection of tools and collaborations than it is a slick piece of software that can somehow act as a panacea to the problem of fire fatalities. Core to its purpose and mission are a set of commitments:

  • an ongoing collaboration with the Red Cross wherein our smoke alarm work informs their on-the-ground outreach
  • a collaboration with DataKind to continue applying volunteer work to the improvement of the underlying models and data that drive the risk analysis
  • a working relationship with major American cities to help integrate our prediction models into their outreach programs

and tools:

  • a downloadable CSV for 178 American municipalities that associate city streets to risk scores
  • an interactive map for an immediate bird’s eye assessment of at-risk city blocks
  • an API endpoint to which users can upload a CSV of local fire incidents in order to improve scores for their area

We believe this is an important contribution to public safety and the better delivery of government services. However, we also consider it a work in progress, a demonstration of how civic analytic solutions can be shared and generalized across the country. We are open sourcing all of the components that went into it and invite anyone with an interest in making it better to get involved….(More)”

Data Collaboratives: Sharing Public Data in Private Hands for Social Good


Beth Simone Noveck (The GovLab) in Forbes: “Sensor-rich consumer electronics such as mobile phones, wearable devices, commercial cameras and even cars are collecting zettabytes of data about the environment and about us. According to one McKinsey study, the volume of data is growing at fifty percent a year. No one needs convincing that these private storehouses of information represent a goldmine for business, but these data can do double duty as rich social assets—if they are shared wisely.

Think about a couple of recent examples: Sharing data held by businesses and corporations (i.e. public data in private hands) can help to improve policy interventions. California planners make water allocation decisions based upon expertise, data and analytical tools from public and private sources, including Intel, the Earth Research Institute at the University of California at Santa Barbara, and the World Food Center at the University of California at Davis.

In Europe, several phone companies have made anonymized datasets available, making it possible for researchers to track calling and commuting patterns and gain better insight into social problems from unemployment to mental health. In the United States, LinkedIn is providing free data about demand for IT jobs in different markets which, when combined with open data from the Department of Labor, helps communities target efforts around training….

Despite the promise of data sharing, these kind of data collaboratives remain relatively new. There is a need toaccelerate their use by giving companies strong tax incentives for sharing data for public good. There’s a need for more study to identify models for data sharing in ways that respect personal privacy and security and enable companies to do well by doing good. My colleagues at The GovLab together with UN Global Pulse and the University of Leiden, for example, published this initial analysis of terms and conditions used when exchanging data as part of a prize-backed challenge. We also need philanthropy to start putting money into “meta research;” it’s not going to be enough to just open up databases: we need to know if the data is good.

After years of growing disenchantment with closed-door institutions, the push for greater use of data in governing can be seen as both a response and as a mirror to the Big Data revolution in business. Although more than 1,000,000 government datasets about everything from air quality to farmers markets are openly available online in downloadable formats, much of the data about environmental, biometric, epidemiological, and physical conditions rest in private hands. Governing better requires a new empiricism for developing solutions together. That will depend on access to these private, not just public data….(More)”

Openness an Essential Building Block for Inclusive Societies


 (Mexico) in the Huffington Post: “The international community faces a complex environment that requires transforming the way we govern. In that sense, 2015 marks a historic milestone, as 193 Member States of the United Nations will come together to agree on the adoption of the 2030 Agenda. With the definition of the 17 Sustainable Development Goals (SDGs), we will set an ambitious course toward a better and more inclusive world for the next 15 years.

The SDGs will be established just when governments deal with new and more defiant challenges, which require increased collaboration with multiple stakeholders to deliver innovative solutions. For that reason, cutting-edge technologies, fueled by vast amounts of data, provide an efficient platform to foster a global transformation and consolidate more responsive, collaborative and open governments.

Goal 16 seeks to promote just, peaceful and inclusive societies by ensuring access to public information, strengthening the rule of law, as well as building stronger and more accountable institutions. By doing so, we will contribute to successfully achieve the rest of the 2030 Agenda objectives.

During the 70th United Nations General Assembly, the 11 countries of the Steering Committee of the Open Government Partnership (OGP), along with civil-society leaders, will gather to acknowledge Goal 16 as a common target through a Joint Declaration: Open Government for the Implementation of the 2030 Agenda for Sustainable Development. As the Global Summit of OGP convenes this year in Mexico City, on October 28th and 29th, my government will call on all 65 members to subscribe to this fundamental declaration.

The SDGs will be reached only through trustworthy, effective and inclusive institutions. This is why Mexico, as current chair of the OGP, has committed to promote citizen participation, innovative policies, transparency and accountability.

Furthermore, we have worked with a global community of key players to develop the international Open Data Charter (ODC), which sets the founding principles for a greater coherence and increased use of open data across the world. We seek to recognize the value of having timely, comprehensive, accessible, and comparable data to improve governance and citizen engagement, as well as to foster inclusive development and innovation….(More)”

The Data Revolution for Sustainable Development


Jeffrey D. Sachs at Project Syndicate: “There is growing recognition that the success of the Sustainable Development Goals (SDGs), which will be adopted on September 25 at a special United Nations summit, will depend on the ability of governments, businesses, and civil society to harness data for decision-making…

One way to improve data collection and use for sustainable development is to create an active link between the provision of services and the collection and processing of data for decision-making. Take health-care services. Every day, in remote villages of developing countries, community health workers help patients fight diseases (such as malaria), get to clinics for checkups, receive vital immunizations, obtain diagnoses (through telemedicine), and access emergency aid for their infants and young children (such as for chronic under-nutrition). But the information from such visits is usually not collected, and even if it is put on paper, it is never used again.
We now have a much smarter way to proceed. Community health workers are increasingly supported by smart-phone applications, which they can use to log patient information at each visit. That information can go directly onto public-health dashboards, which health managers can use to spot disease outbreaks, failures in supply chains, or the need to bolster technical staff. Such systems can provide a real-time log of vital events, including births and deaths, and even use so-called verbal autopsies to help identify causes of death. And, as part of electronic medical records, the information can be used at future visits to the doctor or to remind patients of the need for follow-up visits or medical interventions….
Fortunately, the information and communications technology revolution and the spread of broadband coverage nearly everywhere can quickly make such time lags a thing of the past. As indicated in the report A World that Counts: Mobilizing the Data Revolution for Sustainable Development, we must modernize the practices used by statistical offices and other public agencies, while tapping into new sources of data in a thoughtful and creative way that complements traditional approaches.
Through more effective use of smart data – collected during service delivery, economic transactions, and remote sensing – the fight against extreme poverty will be bolstered; the global energy system will be made much more efficient and less polluting; and vital services such as health and education will be made far more effective and accessible.
With this breakthrough in sight, several governments, including that of the United States, as well as businesses and other partners, have announced plans to launch a new “Global Partnership for Sustainable Development Data” at the UN this month. The new partnership aims to strengthen data collection and monitoring efforts by raising more funds, encouraging knowledge-sharing, addressing key barriers to access and use of data, and identifying new big-data strategies to upgrade the world’s statistical systems.
The UN Sustainable Development Solutions Network will support the new Global Partnership by creating a new Thematic Network on Data for Sustainable Development, which will bring together leading data scientists, thinkers, and academics from across multiple sectors and disciplines to form a center of data excellence….(More)”

Can Yelp Help Government Win Back the Public’s Trust?


Tod Newcombe at Governing: “Look out, DMV, IRS and TSA. Yelp, the popular review website that’s best known for its rants or cheers regarding restaurants and retailers, is about to make it easier to review and rank government services.

Last month, Yelp and the General Services Administration (GSA), which manages the basic functions of the federal government, announced that government workers will soon be able to read and respond to their agencies’ Yelp reviews — and, hopefully, incorporate the feedback into service improvements.

At first glance, the news might not seem so special. There already are Yelp pages for government agencies like Departments of Motor Vehicles, which have been particularly popular. San Francisco’s DMV office, for example, has received more than 450 reviews and has a three-star rating. But federal agencies and workers haven’t been allowed to respond to the reviewers nor could they collect data from the pages because Yelp hadn’t been approved by the GSA. The agreement changes that situation, also making it possible for agencies to set up new Yelp pages….

Yelp has been posting online reviews about restaurants, bars, nail salons and other retailers since 2004. Despite its reputation as a place to vent about bad service, more than two-thirds of the 82 million reviews posted since Yelp started have been positive with most rated at either four or five stars, according to the company’s website. And when businesses boost their Yelp rating by one star, revenues have increased by as much as 9 percent, according to a 2011 study by Harvard Business School Professor Michael Luca.

Now the public sector is about to start paying more attention to those rankings. More importantly, they will find out if engaging the public in a timely fashion changes their perception of government.

While all levels of government have become active with social media, direct interaction between an agency and citizens is still the exception rather than the rule. Agencies typically use Facebook and Twitter to inform followers about services or to provide information updates, not as a feedback mechanism. That’s why having a more direct connection between the comments on a Yelp page and a government agency represents a shift in engagement….(More)”

Using Big Data to Understand the Human Condition: The Kavli HUMAN Project


Azmak Okan et al in the Journal “Big Data”: “Until now, most large-scale studies of humans have either focused on very specific domains of inquiry or have relied on between-subjects approaches. While these previous studies have been invaluable for revealing important biological factors in cardiac health or social factors in retirement choices, no single repository contains anything like a complete record of the health, education, genetics, environmental, and lifestyle profiles of a large group of individuals at the within-subject level. This seems critical today because emerging evidence about the dynamic interplay between biology, behavior, and the environment point to a pressing need for just the kind of large-scale, long-term synoptic dataset that does not yet exist at the within-subject level. At the same time that the need for such a dataset is becoming clear, there is also growing evidence that just such a synoptic dataset may now be obtainable—at least at moderate scale—using contemporary big data approaches. To this end, we introduce the Kavli HUMAN Project (KHP), an effort to aggregate data from 2,500 New York City households in all five boroughs (roughly 10,000 individuals) whose biology and behavior will be measured using an unprecedented array of modalities over 20 years. It will also richly measure environmental conditions and events that KHP members experience using a geographic information system database of unparalleled scale, currently under construction in New York. In this manner, KHP will offer both synoptic and granular views of how human health and behavior coevolve over the life cycle and why they evolve differently for different people. In turn, we argue that this will allow for new discovery-based scientific approaches, rooted in big data analytics, to improving the health and quality of human life, particularly in urban contexts….(More)”

How Morocco Formed a Citizen Powered Constitution and Now Everyone Can Too


Jocelyn Fong at FeedbackLabs: “What if citizens could write the constitution for the society in which they live?

Legislation Lab — a new product of GovRight launched this spring — asks just this question. Dedicated to increasing public awareness and discussion of upcoming legislation, the platform offers citizens easy access to legislation and provides a participatory model to collect their feedback. Citizens can read through drafted legislation, compare it internationally, and then vote, comment, and propose changes to the very language itself — citizens can re-write the fundamental systems and laws that govern their lives.

The world of feedback sees new tools emerging all the time, with only some built to address an actual need. The makers of Legislation Lab are building on years of experience and know that the demand for such radical, open governance not only exists, it thrives.

In the wake of mass demonstrations calling for political reform in Morrocco, Tarik Nesh-Nash (Ashoka Fellow and GovRight co-founder/CEO) launched Reforme.ma to collect the opinions of average Moroccan citizens on proposed changes to the constitution. Little did he know that he would be tapping into a groundswell of citizens eager and determined to share their voices. Within two months, Reforme.ma had over 200,000 visitors from diverse backgrounds, representing all regions of the country. Those 200,000 visitors made over 10,000 comments and proposals to the constitution — 40% of which were included in the new, official draft. In July 2011, Moroccan citizens voted in a referendum and overwhelmingly approved the new constitution.

But Legislation Lab is only GovRight’s latest of many efforts to create channels for better e-governance. Previous endeavors have focused on open legal text, open budgeting, corruption reporting, and citizen-government direct communication — all of which have primarily focused on improving governance in North Africa.

In regions that do not have the history of vibrant democracy, Tariq believes these platforms all work together to create a more informed, engaged, and empowered citizenry–one who is able to participate fully in its government. “Including voice in our laws takes three steps. First, there’s access to information. Then, citizens have the capacity to monitor their government. The last tier is citizen participation in government.” It’s a step-by-step process of building transparency, and then accountability, such that citizens can be involved in the very decision-making that structures their day-to-day lives.

But Legislation Lab is not only relevant for countries transitioning to more democratic styles of governance. Though still in beta, the platform has been asked to replicate its model in Chile for an open consultation on the constitution; New York City has recently approached the organization to help include public opinion in the city’s upcoming housing policy changes. Especially with the platform’s real-time, automated data analysis broken down by demographics, both governments and civil society organizations are yearning to see what the platform can enable.

While global clients may be clammering to use the platform, Legislation Lab is finding that it’s more difficult to get other local citizens as engaged. “In Kurdistan, people are just excited this platform exists. In a more mature democracy, people don’t care,” Tarik explains. When citizens feel political fatigue from false promises and continued negligence, an online platform isn’t going to be a comprehensive fix….(More)”