Researchers wrestle with a privacy problem


Erika Check Hayden at Nature: “The data contained in tax returns, health and welfare records could be a gold mine for scientists — but only if they can protect people’s identities….In 2011, six US economists tackled a question at the heart of education policy: how much does great teaching help children in the long run?

They started with the records of more than 11,500 Tennessee schoolchildren who, as part of an experiment in the 1980s, had been randomly assigned to high- and average-quality teachers between the ages of five and eight. Then they gauged the children’s earnings as adults from federal tax returns filed in the 2000s. The analysis showed that the benefits of a good early education last for decades: each year of better teaching in childhood boosted an individual’s annual earnings by some 3.5% on average. Other data showed the same individuals besting their peers on measures such as university attendance, retirement savings, marriage rates and home ownership.

The economists’ work was widely hailed in education-policy circles, and US President Barack Obama cited it in his 2012 State of the Union address when he called for more investment in teacher training.

But for many social scientists, the most impressive thing was that the authors had been able to examine US federal tax returns: a closely guarded data set that was then available to researchers only with tight restrictions. This has made the study an emblem for both the challenges and the enormous potential power of ‘administrative data’ — information collected during routine provision of services, including tax returns, records of welfare benefits, data on visits to doctors and hospitals, and criminal records. Unlike Internet searches, social-media posts and the rest of the digital trails that people establish in their daily lives, administrative data cover entire populations with minimal self-selection effects: in the US census, for example, everyone sampled is required by law to respond and tell the truth.

This puts administrative data sets at the frontier of social science, says John Friedman, an economist at Brown University in Providence, Rhode Island, and one of the lead authors of the education study “They allow researchers to not just get at old questions in a new way,” he says, “but to come at problems that were completely impossible before.”….

But there is also concern that the rush to use these data could pose new threats to citizens’ privacy. “The types of protections that we’re used to thinking about have been based on the twin pillars of anonymity and informed consent, and neither of those hold in this new world,” says Julia Lane, an economist at New York University. In 2013, for instance, researchers showed that they could uncover the identities of supposedly anonymous participants in a genetic study simply by cross-referencing their data with publicly available genealogical information.

Many people are looking for ways to address these concerns without inhibiting research. Suggested solutions include policy measures, such as an international code of conduct for data privacy, and technical methods that allow the use of the data while protecting privacy. Crucially, notes Lane, although preserving privacy sometimes complicates researchers’ lives, it is necessary to uphold the public trust that makes the work possible.

“Difficulty in access is a feature, not a bug,” she says. “It should be hard to get access to data, but it’s very important that such access be made possible.” Many nations collect administrative data on a massive scale, but only a few, notably in northern Europe, have so far made it easy for researchers to use those data.

In Denmark, for instance, every newborn child is assigned a unique identification number that tracks his or her lifelong interactions with the country’s free health-care system and almost every other government service. In 2002, researchers used data gathered through this identification system to retrospectively analyse the vaccination and health status of almost every child born in the country from 1991 to 1998 — 537,000 in all. At the time, it was the largest study ever to disprove the now-debunked link between measles vaccination and autism.

Other countries have begun to catch up. In 2012, for instance, Britain launched the unified UK Data Service to facilitate research access to data from the country’s census and other surveys. A year later, the service added a new Administrative Data Research Network, which has centres in England, Scotland, Northern Ireland and Wales to provide secure environments for researchers to access anonymized administrative data.

In the United States, the Census Bureau has been expanding its network of Research Data Centers, which currently includes 19 sites around the country at which researchers with the appropriate permissions can access confidential data from the bureau itself, as well as from other agencies. “We’re trying to explore all the available ways that we can expand access to these rich data sets,” says Ron Jarmin, the bureau’s assistant director for research and methodology.

In January, a group of federal agencies, foundations and universities created the Institute for Research on Innovation and Science at the University of Michigan in Ann Arbor to combine university and government data and measure the impact of research spending on economic outcomes. And in July, the US House of Representatives passed a bipartisan bill to study whether the federal government should provide a central clearing house of statistical administrative data.

Yet vast swathes of administrative data are still inaccessible, says George Alter, director of the Inter-university Consortium for Political and Social Research based at the University of Michigan, which serves as a data repository for approximately 760 institutions. “Health systems, social-welfare systems, financial transactions, business records — those things are just not available in most cases because of privacy concerns,” says Alter. “This is a big drag on research.”…

Many researchers argue, however, that there are legitimate scientific uses for such data. Jarmin says that the Census Bureau is exploring the use of data from credit-card companies to monitor economic activity. And researchers funded by the US National Science Foundation are studying how to use public Twitter posts to keep track of trends in phenomena such as unemployment.

 

….Computer scientists and cryptographers are experimenting with technological solutions. One, called differential privacy, adds a small amount of distortion to a data set, so that querying the data gives a roughly accurate result without revealing the identity of the individuals involved. The US Census Bureau uses this approach for its OnTheMap project, which tracks workers’ daily commutes. ….In any case, although synthetic data potentially solve the privacy problem, there are some research applications that cannot tolerate any noise in the data. A good example is the work showing the effect of neighbourhood on earning potential3, which was carried out by Raj Chetty, an economist at Harvard University in Cambridge, Massachusetts. Chetty needed to track specific individuals to show that the areas in which children live their early lives correlate with their ability to earn more or less than their parents. In subsequent studies5, Chetty and his colleagues showed that moving children from resource-poor to resource-rich neighbourhoods can boost their earnings in adulthood, proving a causal link.

Secure multiparty computation is a technique that attempts to address this issue by allowing multiple data holders to analyse parts of the total data set, without revealing the underlying data to each other. Only the results of the analyses are shared….(More)”

Personalising data for development


Wolfgang Fengler and Homi Kharas in the Financial Times: “When world leaders meet this week for the UN’s general assembly to adopt the Sustainable Development Goals (SDGs), they will also call for a “data revolution”. In a world where almost everyone will soon have access to a mobile phone, where satellites will take high-definition pictures of the whole planet every three days, and where inputs from sensors and social media make up two thirds of the world’s new data, the opportunities to leverage this power for poverty reduction and sustainable development are enormous. We are also on the verge of major improvements in government administrative data and data gleaned from the activities of private companies and citizens, in big and small data sets.

But these opportunities are yet to materialize in any scale. In fact, despite the exponential growth in connectivity and the emergence of big data, policy making is rarely based on good data. Almost every report from development institutions starts with a disclaimer highlighting “severe data limitations”. Like castaways on an island, surrounded with water they cannot drink unless the salt is removed, today’s policy makers are in a sea of data that need to be refined and treated (simplified and aggregated) to make them “consumable”.

To make sense of big data, we used to depend on data scientists, computer engineers and mathematicians who would process requests one by one. But today, new programs and analytical solutions are putting big data at anyone’s fingertips. Tomorrow, it won’t be technical experts driving the data revolution but anyone operating a smartphone. Big data will become personal. We will be able to monitor and model social and economic developments faster, more reliably, more cheaply and on a far more granular scale. The data revolution will affect both the harvesting of data through new collection methods, and the processing of data through new aggregation and communication tools.

In practice, this means that data will become more actionable by becoming more personal, more timely and more understandable. Today, producing a poverty assessment and poverty map takes at least a year: it involves hundreds of enumerators, lengthy interviews and laborious data entry. In the future, thanks to hand-held connected devices, data collection and aggregation will happen in just a few weeks. Many more instances come to mind where new and higher-frequency data could generate development breakthroughs: monitoring teacher attendance, stocks and quality of pharmaceuticals, or environmental damage, for example…..

Despite vast opportunities, there are very few examples that have generated sufficient traction and scale to change policy and behaviour and create the feedback loops to further improve data quality. Two tools have personalised the abstract subjects of environmental degradation and demography (see table):

  • Monitoring forest fires. The World Resources Institute has launched Global Forest Watch, which enables users to monitor forest fires in near real time, and overlay relevant spatial information such as property boundaries and ownership data to be developed into a model to anticipate the impact on air quality in affected areas in Indonesia, Singapore and Malaysia.
  • Predicting your own life expectancy. The World Population Program developed a predictive tool – www.population.io – showing each person’s place in the distribution of world population and corresponding statistical life expectancy. In just a few months, this prototype attracted some 2m users who shared their results more than 25,000 times on social media. The traction of the tool resulted from making demography personal and converting an abstract subject matter into a question of individual ranking and life expectancy.

A new Global Partnership for Sustainable Development Data will be launched at the time of the UN General Assembly….(More)”

Data Collaboratives: Sharing Public Data in Private Hands for Social Good


Beth Simone Noveck (The GovLab) in Forbes: “Sensor-rich consumer electronics such as mobile phones, wearable devices, commercial cameras and even cars are collecting zettabytes of data about the environment and about us. According to one McKinsey study, the volume of data is growing at fifty percent a year. No one needs convincing that these private storehouses of information represent a goldmine for business, but these data can do double duty as rich social assets—if they are shared wisely.

Think about a couple of recent examples: Sharing data held by businesses and corporations (i.e. public data in private hands) can help to improve policy interventions. California planners make water allocation decisions based upon expertise, data and analytical tools from public and private sources, including Intel, the Earth Research Institute at the University of California at Santa Barbara, and the World Food Center at the University of California at Davis.

In Europe, several phone companies have made anonymized datasets available, making it possible for researchers to track calling and commuting patterns and gain better insight into social problems from unemployment to mental health. In the United States, LinkedIn is providing free data about demand for IT jobs in different markets which, when combined with open data from the Department of Labor, helps communities target efforts around training….

Despite the promise of data sharing, these kind of data collaboratives remain relatively new. There is a need toaccelerate their use by giving companies strong tax incentives for sharing data for public good. There’s a need for more study to identify models for data sharing in ways that respect personal privacy and security and enable companies to do well by doing good. My colleagues at The GovLab together with UN Global Pulse and the University of Leiden, for example, published this initial analysis of terms and conditions used when exchanging data as part of a prize-backed challenge. We also need philanthropy to start putting money into “meta research;” it’s not going to be enough to just open up databases: we need to know if the data is good.

After years of growing disenchantment with closed-door institutions, the push for greater use of data in governing can be seen as both a response and as a mirror to the Big Data revolution in business. Although more than 1,000,000 government datasets about everything from air quality to farmers markets are openly available online in downloadable formats, much of the data about environmental, biometric, epidemiological, and physical conditions rest in private hands. Governing better requires a new empiricism for developing solutions together. That will depend on access to these private, not just public data….(More)”

Addressing Inequality and the ‘Data Divide’


Daniel Castro at the US Chamber of Commerce Foundation: “In the coming years, communities across the nation will increasingly rely on data to improve quality of life for their residents, such as by improving educational outcomes, reducing healthcare costs, and increasing access to financial services. However, these opportunities require that individuals have access to high-quality data about themselves and their communities. Should certain individuals or communities not routinely have data about them collected, distributed, or used, they may suffer social and economic consequences. Just as the digital divide has held back many communities from reaping the benefits of the modern digital era, a looming “data divide” threatens to stall the benefits of data-driven innovation for a wide swathe of America. Given this risk, policymakers should make a concerted effort to combat data poverty.

Data already plays a crucial role in guiding decision making, and it will only become more important over time. In the private sector, businesses use data for everything from predicting inventory demand to responding to customer feedback to determining where to open new stores. For example, an emerging group of financial service providers use non-traditional data sources, such as an individual’s social network, to assess credit risk and make lending decisions. And health insurers and pharmacies are offering discounts to customers who use fitness trackers to monitor and share data about their health. In the public sector, data is at the heart of important efforts like improving patient safety, cutting government waste, and helping children succeed in school. For example, public health officials in states like Indiana and Maryland have turned to data science in an effort to reduce infant mortality rates.

Many of these exciting advancements are made possible by a new generation of technologies that make it easier to collect, share, and disseminate data. In particular, the Internet of Everything is creating a plethora of always-on devices that record and transmit a wealth of information about our world and the people and objects in it. Individuals are using social media to create a rich tapestry of interactions tied to particular times and places. In addition, government investments in critical data systems, such as statewide databases to track healthcare spending and student performance over time, are integral to efforts to harness data for social good….(More)”

Can Yelp Help Government Win Back the Public’s Trust?


Tod Newcombe at Governing: “Look out, DMV, IRS and TSA. Yelp, the popular review website that’s best known for its rants or cheers regarding restaurants and retailers, is about to make it easier to review and rank government services.

Last month, Yelp and the General Services Administration (GSA), which manages the basic functions of the federal government, announced that government workers will soon be able to read and respond to their agencies’ Yelp reviews — and, hopefully, incorporate the feedback into service improvements.

At first glance, the news might not seem so special. There already are Yelp pages for government agencies like Departments of Motor Vehicles, which have been particularly popular. San Francisco’s DMV office, for example, has received more than 450 reviews and has a three-star rating. But federal agencies and workers haven’t been allowed to respond to the reviewers nor could they collect data from the pages because Yelp hadn’t been approved by the GSA. The agreement changes that situation, also making it possible for agencies to set up new Yelp pages….

Yelp has been posting online reviews about restaurants, bars, nail salons and other retailers since 2004. Despite its reputation as a place to vent about bad service, more than two-thirds of the 82 million reviews posted since Yelp started have been positive with most rated at either four or five stars, according to the company’s website. And when businesses boost their Yelp rating by one star, revenues have increased by as much as 9 percent, according to a 2011 study by Harvard Business School Professor Michael Luca.

Now the public sector is about to start paying more attention to those rankings. More importantly, they will find out if engaging the public in a timely fashion changes their perception of government.

While all levels of government have become active with social media, direct interaction between an agency and citizens is still the exception rather than the rule. Agencies typically use Facebook and Twitter to inform followers about services or to provide information updates, not as a feedback mechanism. That’s why having a more direct connection between the comments on a Yelp page and a government agency represents a shift in engagement….(More)”

Flutrack.org: Open-source and linked data for epidemiology


Chorianopoulos K, and Talvis K at Health Informatics Journal: “Epidemiology has made advances, thanks to the availability of real-time surveillance data and by leveraging the geographic analysis of incidents. There are many health information systems that visualize the symptoms of influenza-like illness on a digital map, which is suitable for end-users, but it does not afford further processing and analysis. Existing systems have emphasized the collection, analysis, and visualization of surveillance data, but they have neglected a modular and interoperable design that integrates high-resolution geo-location with real-time data. As a remedy, we have built an open-source project and we have been operating an open service that detects flu-related symptoms and shares the data in real-time with anyone who wants to built upon this system. An analysis of a small number of precisely geo-located status updates (e.g. Twitter) correlates closely with the Google Flu Trends and the Centers for Disease Control and Prevention flu-positive reports. We suggest that public health information systems should embrace an open-source approach and offer linked data, in order to facilitate the development of an ecosystem of applications and services, and in order to be transparent to the general public interest…(More)” See also http://www.flutrack.org/

 

Civic Jazz in the New Maker Cities


 at Techonomy: “Our civic innovation movement is about 6 years old.  It began when cities started opening up data to citizens, journalists, public-sector companies, non-profits, and government agencies.  Open data is an invitation: it’s something to go to work on— both to innovate and to create a more transparent environment about what works and what doesn’t.  I remember when we first opened data in SF and began holding conferences and hackathons. In short order we saw a community emerge with remarkable capacity to contribute to, tinker with, hack, explore and improve the city.

Early on this took the form of visualizing data, like crime patterns in Oakland. This was followed by engagement: “Look, the police are skating by and not enforcing prostitution laws. Lets call them on it!”   Civic hackathons brought together journalists, software developers, hardware people, and urbanists. I recall when artists teamed with the Arup engineering firm to build noise sensors and deployed them in the Tenderloin neighborhood (with absolutely no permission from anybody). Noise was an issue. How could you understand the problem unless you measured it?

Something as wonky as an API invited people in, at which point a sense of civic possibility and wonder set in. Suddenly whole swaths of the city were working on the city.  During the SF elections four years ago Gray Area Foundation for the Arts (which I chair) led a project with candidates, bureaucrats, and hundreds of volunteers for a summer-long set of hackathons and projects. We were stunned so many people would come together and collaborate so broadly. It was a movement, fueled by a sense of agency and informed by social media. Today cities are competing on innovation. It has become a movement.

All this has been accelerated by startups, incubators, and the economy’s whole open innovation conversation.  Remarkably, we now see capital from flowing in to support urban and social ventures where we saw none just a few years ago. The accelerator Tumml in SF is a premier example, but there are similar efforts in many cities.

This initial civic innovation movement was focused on apps and data, a relatively easy place to start. With such an approach you’re not contending for real estate or creating something that might gentrify neighborhoods. Today this movement is at work on how we design the city itself.  As millennials pour in and cities are where most of us live, enormous experimentation is at play. Ours is a highly interdisciplinary age, mixing new forms of software code and various physical materials, using all sorts of new manufacturing techniques.

Brooklyn is a great example.  A few weeks ago I met with Bob Bland, CEO of Manufacture New York. This ambitious 160,000 square foot public/private partnership is reimagining the New York fashion business. In one place it co-locates contract manufacturers, emerging fashion brands and advanced fashion research. Think wearables, sensors, smart fabrics, and the application of advanced manufacturing to fashion. By bringing all these elements under one roof, the supply chain can be compressed, sped-up, and products made more innovative.

New York City’s Economic Development office envisions a local urban supply chain that can offer a scalable alternative to the giant extended global one. In fashion it makes more and more sense for brands to be located near their suppliers. Social media speeds up fashion cycles, so we’re moving beyond predictable seasons and looks specified ahead of time. Manufacturers want to place smaller orders more frequently, so they can take less inventory risk and keep current with trends.

When you put so much talent in one space, creativity flourishes. In fashion, unlike tech, there isn’t a lot of IP protection. So designers can riff off each other’s idea and incorporate influences as artists do. What might be called stealing ideas in the software business is seen in fashion as jazz and a way to create a more interesting work environment.

A few blocks away is the Brooklyn Navy Yard, a mammoth facility at the center of New York’s emerging maker economy. …In San Francisco this urban innovation movement is working on the form of the city itself. Our main boulevard, Market Street, is to be reimagined, repaved, and made greener with far fewer private vehicles over the next two years. Our planning department, in concert with art organizations here, has made citizen-led urban prototyping the centerpiece of the planning process….(More)”

US citizenship and immigration services to host Twitter ‘office hours’


NextGov: “U.S. Citizenship and Immigration Services wants to get more social with prospective immigrants.

USCIS will host its first-ever Twitter office hours Tuesday from 3-4 p.m. using the #AskUSCIS hashtag. Agency officials hope to provide another avenue for customers to ask questions and receive real-time feedback, according to a USCIS blog post.

To participate, customers just have to follow @USCIS on Twitter, use the hashtag and ask away, although the blog post makes clear staff won’t answer case-specific questions and case status updates.

The post also warns Twitter users not to post Social Security numbers, receipt numbers or any other personally identifiable information.

“With Twitter office hours, we want to help you – either as you’re preparing forms or after you’ve filed,” the blog post states.

USCIS will post a transcript of the questions and answers to its blog following the Twitter office hours, and if the concept is successful, the agency plans to host the sessions on a regular basis.

The agency’s social outreach plan is part of broader effort among federal agencies to improve customer experience.

This particular variant of digital engagement mirrors an effort championed by the Office of Federal Student Aid. FAFSA answers questions on the last Wednesday of every month using the hashtag #AskFAFSA, an effort that’s helped build its digital engagement significantly in recent years….(More)”

Dissecting the Spirit of Gezi: Influence vs. Selection in the Occupy Gezi Movement


New study by Ceren Budak and Duncan J. Watts in Sociological Science: “Do social movements actively shape the opinions and attitudes of participants by bringing together diverse groups that subsequently influence one another? Ethnographic studies of the 2013 Gezi uprising seem to answer “yes,” pointing to solidarity among groups that were traditionally indifferent, or even hostile, to one another. We argue that two mechanisms with differing implications may generate this observed outcome: “influence” (change in attitude caused by interacting with other participants); and “selection” (individuals who participated in the movement were generally more supportive of other groups beforehand).

We tease out the relative importance of these mechanisms by constructing a panel of over 30,000 Twitter users and analyzing their support for the main Turkish opposition parties before, during, and after the movement. We find that although individuals changed in significant ways, becoming in general more supportive of the other opposition parties, those who participated in the movement were also significantly more supportive of the other parties all along. These findings suggest that both mechanisms were important, but that selection dominated. In addition to our substantive findings, our paper also makes a methodological contribution that we believe could be useful to studies of social movements and mass opinion change more generally. In contrast with traditional panel studies, which must be designed and implemented prior to the event of interest, our method relies on ex post panel construction, and hence can be used to study unanticipated or otherwise inaccessible events. We conclude that despite the well known limitations of social media, their “always on” nature and their widespread availability offer an important source of public opinion data….(More)”

Customer-Driven Government


Jane Wiseman at DataSmart City Solutions: “Public trust in government is low — of the 43 industries tracked in the American Customer Satisfaction Index, only one ranks lower than the federal government in satisfaction levels.  Local government ranks a bit higher than the federal government, but for most of the public, that makes little difference. It’s time for government to change that perception by listening to its customers and improving service delivery.

What can the cup holder in your car teach government about customer engagement? A cup holder would be hard to live without — it keeps a latte from spilling and has room for keys and a phone. But the cup holder was not always such a multi-tasker. The first ones were shallow indentations in the plastic on the inside of the glove box. Accelerate and the drinks went flying. Did a brilliant automotive engineer decide that was a design flaw and fix it? No. It was only when Chrysler received more complaints about the cup holder than about anything else in their cars that they were forced to innovate. Don Clark, a DaimlerChrysler engineer known as the “Cup Holder King,” designed the first of the modern cup holders, debuting in the company’s 1984 minivans. The engineersthought they knew what their customers wanted (more powerful engines, better fuel economy, safety features) but it wasn’t until they listened to customers’ comments that they put in the cup holder. And sales took off.

Today, we’re awash in customer feedback, seemingly everywhere but government.  Over the past decade, customer feedback ratings for products and services have shown up everywhere — whether in a review on Yelp, a “like” on Facebook, or a Tweet about the virtues or shortcomings of a product or service.  Ratings help draw attention to poor quality and allow companies to address these gaps.  Many companies routinely follow up a customer interaction with a satisfaction survey.  This data drives improvement efforts aimed at keeping customers happy.  Some companies aggressively manage their online reviews, seeking to increase their NPS, or net promoter score.  Many people really like to provide feedback — there are 77 million reviews on Yelp to date, according to the company.  Imagine the power of that many reviews of government service.

If customer input can influence the automotive industry, and can help consumers make better decisions, what if we turned this energy toward government?  After all, the government is run “by the people” and “for the people” — what if citizens gave government real-time guidance on improving services?  And could leaders in government ask customers what they want, instead of presuming to know?  This paper explores these questions and suggests a way forward.

….

If I were a mayor, how would I begin harnessing customer feedback to improve service delivery?  I would build a foundation for improving core city operations (trash pickup, pothole fixing, etc.) by using the same three questions Kansas City uses for follow-up surveys to all who contact 311.  Upon that foundation I would layer additional outreach on a tactical, ad hoc basis.  I would experiment with the growing body of tools for engaging the public in shaping tactical decisions, such as how to allocate capital projects and where to locate bike share hubs.

To get an even deeper insight into the customer experience, I might copy what Somerville, MA has done with its Secret Resident program.  Trained volunteers assess the efficiency, courtesy, and ease of use of selected city departments.  The volunteers transact typical city services by phone or in person, and then document their customer experience.  They rate the agencies, and the 311 call center, and provide assessments that can help improve customer service.

By listening to and leveraging data on constituent calls for service, government can move from a culture of reaction to a proactive culture of listening and learning from the data provided by the public.  Engaging the public, and following through on the suggestions they give, can increase not only the quality of government service, but the faith of the public that government can listen and respond.

Every leader in government should commit to getting feedback from customers — it’s the only way to know how to increase their satisfaction with the services.  There is no more urgent time to improve the customer experience…(More)