Privacy concerns collide with the public interest in data


Gillian Tett in the Financial Times: “Late last year Statistics Canada — the agency that collects government figures — launched an innovation: it asked the country’s banks to supply “individual-level financial transactions data” for 500,000 customers to allow it to track economic trends. The agency argued this was designed to gather better figures for the public interest. However, it tipped the banks into a legal quandary. Under Canadian law (as in most western countries) companies are required to help StatsCan by supplying operating information. But data privacy laws in Canada also say that individual bank records are confidential. When the StatsCan request leaked out, it sparked an outcry — forcing the agency to freeze its plans. “It’s a mess,” a senior Canadian banker says, adding that the laws “seem contradictory”.

Corporate boards around the world should take note. In the past year, executive angst has exploded about the legal and reputational risks created when private customer data leak out, either by accident or in a cyber hack. Last year’s Facebook scandals have been a hot debating topic among chief executives at this week’s World Economic Forum in Davos, as has the EU’s General Data Protection Regulation. However, there is another important side to this Big Data debate: must companies provide private digital data to public bodies for statistical and policy purposes? Or to put it another way, it is time to widen the debate beyond emotive privacy issues to include the public interest and policy needs. The issue has received little public debate thus far, except in Canada. But it is becoming increasingly important.

Companies are sitting on a treasure trove of digital data that offers valuable real-time signals about economic activity. This information could be even more significant than existing statistics, because they struggle to capture how the economy is changing. Take Canada. StatsCan has hitherto tracked household consumption by following retail sales statistics, supplemented by telephone surveys. But consumers are becoming less willing to answer their phones, which undermines the accuracy of surveys, and consumption of digital services cannot be easily pursued. ...

But the biggest data collections sit inside private companies. Big groups know this, and some are trying to respond. Google has created its own measures to track inflation, which it makes publicly available. JPMorgan and other banks crunch customer data and publish reports about general economic and financial trends. Some tech groups are even starting to volunteer data to government bodies. LinkedIn has offered to provide anonymised data on education and employment to municipal and city bodies in America and beyond, to help them track local trends; the group says this is in the public interest for policy purposes, as “it offers a different perspective” than official data sources. But it is one thing for LinkedIn to offer anonymised data when customers have signed consent forms permitting the transfer of data; it is quite another for banks (or other companies) who have operated with strict privacy rules. If nothing else, the CanStat saga shows there urgently needs to be more public debate, and more clarity, around these rules. Consumer privacy issues matter (a lot). But as corporate data mountains grow, we will need to ask whether we want to live in a world where Amazon and Google — and Mastercard and JPMorgan — know more about economic trends than central banks or finance ministries. Personally, I would say “no”. But sooner or later politicians will need to decide on their priorities in this brave new Big Data world; the issue cannot be simply left to the half-hidden statisticians….(More)”.

Saying yes to State Longitudinal Data Systems: building and maintaining cross agency relationships


Report by the National Skills Coalition: “In order to provide actionable information to stakeholders, state longitudinal data systems use administrative data that state agencies collect through administering programs. Thus, state longitudinal data systems must maintain strong working relationships with the state agencies collecting necessary administrative data. These state agencies can include K-12 and higher education agencies, workforce agencies, and those administering social service programs such as the Supplemental Nutrition Assistance Program or Temporary Assistance for Needy Families.

When state longitudinal data systems have strong relationships with agencies, agencies willingly and promptly share their data with the system, engage with data governance when needed, approve research requests in a timely manner, and continue to cooperate with the system over the long term. If state agencies do not participate with their state’s longitudinal data system, the work of the system is put into jeopardy. States may find that research and performance reporting can be stalled or stopped outright.

Kentucky and Virginia have been able to build and maintain support for their systems among state agencies. Their example demonstrates how states can effectively utilize their state longitudinal data systems….(More)”.

Google’s Sidewalk Labs Plans to Package and Sell Location Data on Millions of Cellphones


Ava Kofman at the Intercept: “Most of the data collected by urban planners is messy, complex, and difficult to represent. It looks nothing like the smooth graphs and clean charts of city life in urban simulator games like “SimCity.” A new initiative from Sidewalk Labs, the city-building subsidiary of Google’s parent company Alphabet, has set out to change that.

The program, known as Replica, offers planning agencies the ability to model an entire city’s patterns of movement. Like “SimCity,” Replica’s “user-friendly” tool deploys statistical simulations to give a comprehensive view of how, when, and where people travel in urban areas. It’s an appealing prospect for planners making critical decisions about transportation and land use. In recent months, transportation authorities in Kansas City, Portland, and the Chicago area have signed up to glean its insights. The only catch: They’re not completely sure where the data is coming from.

Typical urban planners rely on processes like surveys and trip counters that are often time-consuming, labor-intensive, and outdated. Replica, instead, uses real-time mobile location data. As Nick Bowden of Sidewalk Labs has explained, “Replica provides a full set of baseline travel measures that are very difficult to gather and maintain today, including the total number of people on a highway or local street network, what mode they’re using (car, transit, bike, or foot), and their trip purpose (commuting to work, going shopping, heading to school).”

To make these measurements, the program gathers and de-identifies the location of cellphone users, which it obtains from unspecified third-party vendors. It then models this anonymized data in simulations — creating a synthetic population that faithfully replicates a city’s real-world patterns but that “obscures the real-world travel habits of individual people,” as Bowden told The Intercept.

The program comes at a time of growing unease with how tech companies use and share our personal data — and raises new questions about Google’s encroachment on the physical world….(More)”.

Contracts for Data Collaboration


The GovLab: “The road to achieving the Sustainable Development Goals is complex and challenging. Policymakers around the world need both new solutions and new ways to become more innovative. This includes evidence-based policy and program design, as well as improved monitoring of progress made.

Unlocking privately processed data through data collaboratives — a new form of public-private partnership in which private industry, government and civil society work together to release previously siloed data — has become essential to address the challenges of our era.

Yet while research has proven its promise and value, several barriers to scaling data collaboration exist.

Ensuring trust and shared responsibility in how the data will be handled and used proves particularly challenging, because of the high transaction costs involved in drafting contracts and agreements of sharing.

Ensuring Trust in Data Collaboration

The goal of the Contracts for Data Collaboration (C4DC) initiative is to address the inefficiencies of developing contractual agreements for public-private data collaboration.

The intent is to inform and guide those seeking to establish a data collaborative by developing and making available a shared repository of contractual clauses (taken from existing data sharing agreements) that covers a host of issues, including (non –exclusive):

  • The provenance, quality and purpose of data;
  • Security and privacy concerns;
  • Roles and responsibilities of participants;
  • Access provisions; and use limitations;
  • Governance mechanisms;
  • Other contextual mechanisms

In addition to the searchable library of contractual clauses, the repository will house use cases, guides and other information that analyse common patterns, language and best practices.

Help Us Scale Data Collaboration

Contracts for Data Collaboration builds on efforts from member organizations that have experience in developing and managing data collaboratives; and have documented the legal challenges and opportunities of data collaboration.

The initiative is an open collaborative with charter members from the GovLab at NYU, UN SDSN Thematic Research Network on Data and Statistics (TReNDS), University of Washington and the World Economic Forum.

Organizations interested in joining the initiative should contact the individuals noted below; or share any agreements they have used for data sharing activities (without any sensitive or identifiable information): Stefaan Verhulst, GovLab (Stefaan@thegovlab.org) …(More)

Looking after and using data for public benefit


Heather Savory at the Office for National Statistics (UK): “Official Statistics are for the benefit of society and the economy and help Britain to make better decisions. They allow the formulation of better public policy and the effective measurement of those policies. They inform the direction of economic and commercial activities. They provide valuable information for analysts, researchers, public and voluntary bodies. They enable the public to hold organisations that spend public money to account, thus informing democratic debate.

The ability to harness the power of data is critical in enabling official statistics to support the most important decisions facing the country.

Under the new powers in the Digital Economy Act , ONS can now gain access to new and different sources of data including ‘administrative’ data from government departments and commercial data. Alongside the availability of these new data sources ONS is experiencing a strong demand for ad hoc insights alongside our traditional statistics.

We need to deliver more, faster, finer-grained insights into the economy and society. We need to deliver high quality, trustworthy information, on a faster timescale, to help decision-making. We will increasingly develop innovative data analysis methods, for example using images to gain insight from the work we’ve recently announced on Urban Forests….

I should explain here that our data is not held in one big linked database; we’re architecting our Data Access Platform so that data can be linked in different ways for different purposes. This is designed to preserve data confidentiality, so only the necessary subset of data is accessible by authorised people, for a certain purpose. To avoid compromising their effectiveness, we do not make public the specific details of the security measures we have in place, but our recently tightened security regime, which is independently assured by trusted external bodies, includes:

  • physical measures to restrict who can access places where data is stored;
  • protective measures for all data-related IT services;
  • measures to restrict who can access systems and data held by ONS;
  • controls to guard against staff or contractors misusing their legitimate access to data; including vetting to an appropriate level for the sensitivity of data to which they might have access.

One of the things I love about working in the public sector is that our work can be shared openly.

We live in a rapidly changing and developing digital world and we will continue to monitor and assess the data standards and security measures in place to ensure they remain strong and effective. So, as well as sharing this work openly to reassure all our data suppliers that we’re taking good care of their data, we’re also seeking feedback on our revised data policies.

The same data can provide different insights when viewed through different lenses or in different combinations. The more data is shared – with the appropriate safeguards of course – the more it has to give.

If you work with data, you’ll know that collaborating with others in this space is key and that we need to be able to share data more easily when it makes sense to do so. So, the second reason for sharing this work openly is that, if you’re in the technical space, we’d value your feedback on our approach and if you’re in the data space and would like to adopt the same approach, we’d love to support you with that – so that we can all share data more easily in the future….(More)

ONS’s revised policies on the use, management and security of data can befound here.

All of Us Research Program Expands Data Collection Efforts with Fitbit


NIH Press Release: “The All of Us Research Program has launched the Fitbit Bring-Your-Own-Device (BYOD) project. Now, in addition to providing health information through surveys, electronic health records, and biosamples, participants can choose to share data from their Fitbit accounts to help researchers make discoveries. The project is a key step for the program in integrating digital health technologies for data collection.

Digital health technologies, like mobile apps and wearable devices, can gather data outside of a hospital or clinic. This data includes information about physical activity, sleep, weight, heart rate, nutrition, and water intake, which can give researchers a more complete picture of participants’ health. The All of Us Research Program is now gathering this data in addition to surveys, electronic health record information, physical measurements, and blood and urine samples, working to make the All of Us resource one of the largest and most diverse data sets of its kind for health research.

“Collecting real-world, real-time data through digital technologies will become a fundamental part of the program,” said Eric Dishman, director of the All of Us Research Program. “This information, in combination with many other data types, will give us an unprecedented ability to better understand the impact of lifestyle and environment on health outcomes and, ultimately, develop better strategies for keeping people healthy in a very precise, individualized way.”…

All of Us is developing additional plans to incorporate digital health technologies. A second project with Fitbit is expected to launch later in the year. It will include providing devices to a limited number of All of Us participants who will be randomly invited to take part, to enable them to share wearable data with the program. And All of Us will add connections to other devices and apps in the future to further expand data collection efforts and engage participants in new ways….(More)”.

The promises — and challenges — of data collaboratives for the SDGs


Paula Hidalgo-Sanchis and Stefaan G. Verhulst at Devex: “As the road to achieving the Sustainable Development Goals becomes more complex and challenging, policymakers around the world need both new solutions and new ways to become more innovative. This includes better policy and program design based on evidence to solve problems at scale. The use of big data — the vast majority of which is collected, processed, and analyzed by the private sector — is key.

In the past few months, we at UN Global Pulse and The GovLab have sought to understand pathways to make policymaking more evidence-based and data-driven with the use of big data. Working in parallel at both local and global scale, we have conducted extensive desk research, held a series of workshops, and conducted in-depth conversations and interviews with key stakeholders, including government, civil society, and private sector representatives.

Our work is driven by a recognition of the potential of use of privately processed data through data collaboratives — a new form of public-private partnership in which government, private industry, and civil society work together to release previously siloed data, making it available to address the challenges of our era.

Research suggests that data collaboratives offer tremendous potential when implemented strategically under the appropriate policy and ethical frameworks. Nonetheless, this remains a nascent field, and we have summarized some of the barriers that continue to confront data collaboratives, with an eye toward ultimately proposing solutions to make them more effective, scalable, sustainable, and responsible.

Here are seven challenges…(More)”.

Data Policy in the Fourth Industrial Revolution: Insights on personal data


Report by the World Economic Forum: “Development of comprehensive data policy necessarily involves trade-offs. Cross-border data flows are crucial to the digital economy. The use of data is critical to innovation and technology. However, to engender trust, we need to have appropriate levels of protection in place to ensure privacy, security and safety. Over 120 laws in effect across the globe today provide differing levels of protection for data but few anticipated 

Data Policy in the Fourth Industrial Revolution: Insights on personal data, a paper by the World Economic Forum in collaboration with the Ministry of Cabinet Affairs and the Future, United Arab Emirates, examines the relationship between risk and benefit, recognizing the impact of culture, values and social norms This work is a start toward developing a comprehensive data policy toolkit and knowledge repository of case studies for policy makers and data policy leaders globally….(More)”.

A Research Roadmap to Advance Data Collaboratives Practice as a Novel Research Direction


Iryna Susha, Theresa A. Pardo, Marijn Janssen, Natalia Adler, Stefaan G. Verhulst and Todd Harbour in the  International Journal of Electronic Government Research (IJEGR): “An increasing number of initiatives have emerged around the world to help facilitate data sharing and collaborations to leverage different sources of data to address societal problems. They are called “data collaboratives”. Data collaboratives are seen as a novel way to match real life problems with relevant expertise and data from across the sectors. Despite its significance and growing experimentation by practitioners, there has been limited research in this field. In this article, the authors report on the outcomes of a panel discussing critical issues facing data collaboratives and develop a research and development agenda. The panel included participants from the government, academics, and practitioners and was held in June 2017 during the 18th International Conference on Digital Government Research at City University of New York (Staten Island, New York, USA). The article begins by discussing the concept of data collaboratives. Then the authors formulate research questions and topics for the research roadmap based on the panel discussions. The research roadmap poses questions across nine different topics: conceptualizing data collaboratives, value of data, matching data to problems, impact analysis, incentives, capabilities, governance, data management, and interoperability. Finally, the authors discuss how digital government research can contribute to answering some of the identified research questions….(More)”. See also: http://datacollaboratives.org/

Google Searches Could Predict Heroin Overdoses


Rod McCullom at Scientific American: “About 115 people nationwide die every day from opioid overdoses, according to the U.S. Centers for Disease Control and Prevention. A lack of timely, granular data exacerbates the crisis; one study showed opioid deaths were undercounted by as many as 70,000 between 1999 and 2015, making it difficult for governments to respond. But now Internet searches have emerged as a data source to predict overdose clusters in cities or even specific neighborhoods—information that could aid local interventions that save lives. 

The working hypothesis was that some people searching for information on heroin and other opioids might overdose in the near future. To test this, a researcher at the University of California Institute for Prediction Technology (UCIPT) and his colleagues developed several statistical models to forecast overdoses based on opioid-related keywords, metropolitan income inequality and total number of emergency room visits. They discovered regional differences (graphic) in where and how people searched for such information and found that more overdoses were associated with a greater number of searches per keyword. The best-fitting model, the researchers say, explained about 72 percent of the relation between the most popular search terms and heroin-related E.R. visits. The authors say their study, published in the September issue of Drug and Alcohol Dependence, is the first report of using Google searches in this way. 

To develop their models, the researchers obtained search data for 12 prescription and nonprescription opioids between 2005 and 2011 in nine U.S. metropolitan areas. They compared these with Substance Abuse and Mental Health Services Administration records of heroin-related E.R. admissions during the same period. The models can be modified to predict overdoses of other opioids or narrow searches to specific zip codes, says lead study author Sean D. Young, a behavioral psychologist and UCIPT executive director. That could provide early warnings of overdose clusters and help to decide where to distribute the overdose reversal medication Naloxone….(More)”.