Open Data Privacy Playbook


A data privacy playbook by Ben Green, Gabe Cunningham, Ariel Ekblaw, Paul Kominers, Andrew Linzer, and Susan Crawford: “Cities today collect and store a wide range of data that may contain sensitive or identifiable information about residents. As cities embrace open data initiatives, more of this information is available to the public. While releasing data has many important benefits, sharing data comes with inherent risks to individual privacy: released data can reveal information about individuals that would otherwise not be public knowledge. In recent years, open data such as taxi trips, voter registration files, and police records have revealed information that many believe should not be released.

Effective data governance is a prerequisite for successful open data programs. The goal of this document is to codify responsible privacy-protective approaches and processes that could be adopted by cities and other government organizations that are publicly releasing data. Our report is organized around four recommendations:

  • Conduct risk-benefit analyses to inform the design and implementation of open data programs.
  • Consider privacy at each stage of the data lifecycle: collect, maintain, release, delete.
  • Develop operational structures and processes that codify privacy management widely throughout the City.
  • Emphasize public engagement and public priorities as essential aspects of data management programs.

Each chapter of this report is dedicated to one of these four recommendations, and provides fundamental context along with specific suggestions to carry them out. In particular, we provide case studies of best practices from numerous cities and a set of forms and tactics for cities to implement our recommendations. The Appendix synthesizes key elements of the report into an Open Data Privacy Toolkit that cities can use to manage privacy when releasing data….(More)”

Connecting the dots: Building the case for open data to fight corruption


Web Foundation: “This research, published with Transparency International, measures the progress made by 5 key countries in implementing the G20 Anti-Corruption Open Data Principles.

These principles, adopted by G20 countries in 2015, committed countries to increasing and improving the publication of public information, driving forward open data as a tool in anti-corruption efforts.

However, this research – looking at Brazil, France, Germany, Indonesia and South Africa – finds a disappointing lack of progress. No country studied has released all the datasets identified as being key to anti-corruption and much of the information is hard to find and hard use.

Key findings:

  • No country released all anti-corruption datasets
  • Quality issues means data is often not useful or useable
  • Much of the data is not published in line with open data standards, making comparability difficult
  • In many countries there is a lack of open data skills among officials in charge of anti-corruption initiatives

Download the overview report here (PDF), and access the individual country case studies BrazilFranceGermanyIndonesia and South Africa… (More)”

Data Disrupts Corruption


Carlos Santiso & Ben Roseth at Stanford Social Innovation Review: “…The Panama Papers scandal demonstrates the power of data analytics to uncover corruption in a world flooded with terabytes needing only the computing capacity to make sense of it all. The Rousse impeachment illustrates how open data can be used to bring leaders to account. Together, these stories show how data, both “big” and “open,” is driving the fight against corruption with fast-paced, evidence-driven, crowd-sourced efforts. Open data can put vast quantities of information into the hands of countless watchdogs and whistleblowers. Big data can turn that information into insight, making corruption easier to identify, trace, and predict. To realize the movement’s full potential, technologists, activists, officials, and citizens must redouble their efforts to integrate data analytics into policy making and government institutions….

Making big data open cannot, in itself, drive anticorruption efforts. “Without analytics,” a 2014 White House report on big data and individual privacy underscored, “big datasets could be stored, and they could be retrieved, wholly or selectively. But what comes out would be exactly what went in.”

In this context, it is useful to distinguish the four main stages of data analytics to illustrate its potential in the global fight against corruption: Descriptive analytics uses data to describe what has happened in analyzing complex policy issues; diagnostic analytics goes a step further by mining and triangulating data to explain why a specific policy problem has happened, identify its root causes, and decipher underlying structural trends; predictive analytics uses data and algorithms to predict what is most likely to occur, by utilizing machine learning; and prescriptive analytics proposes what should be done to cause or prevent something from happening….

Despite the big data movement’s promise for fighting corruption, many challenges remain. The smart use of open and big data should focus not only on uncovering corruption, but also on better understanding its underlying causes and preventing its recurrence. Anticorruption analytics cannot exist in a vacuum; it must fit in a strategic institutional framework that starts with quality information and leads to reform. Even the most sophisticated technologies and data innovations cannot prevent what French novelist Théophile Gautier described as the “inexplicable attraction of corruption, even amongst the most honest souls.” Unless it is harnessed for improvements in governance and institutions, data analytics will not have the impact that it could, nor be sustainable in the long run…(More)”.

Big and open data are prompting a reform of scientific governance


Sabina Leonelli in Times Higher Education: “Big data are widely seen as a game-changer in scientific research, promising new and efficient ways to produce knowledge. And yet, large and diverse data collections are nothing new – they have long existed in fields such as meteorology, astronomy and natural history.

What, then, is all the fuss about? In my recent book, I argue that the true revolution is in the status accorded to data as research outputs in their own right. Along with this has come an emphasis on open data as crucial to excellent and reliable science.

Previously – ever since scientific journals emerged in the 17th century – data were private tools, owned by the scientists who produced them and scrutinised by a small circle of experts. Their usefulness lay in their function as evidence for a given hypothesis. This perception has shifted dramatically in the past decade. Increasingly, data are research components that can and should be made publicly available and usable.

Rather than the birth of a data-driven epistemology, we are thus witnessing the rise of a data-centric approach in which efforts to mobilise, integrate and visualise data become contributions to discovery, not just a by-product of hypothesis testing.

The rise of data-centrism highlights the challenges involved in gathering, classifying and interpreting data, and the concepts, technologies and social structures that surround these processes. This has implications for how research is conducted, organised, governed and assessed.

Data-centric science requires shifts in the rewards and incentives provided to those who produce, curate and analyse data. This challenges established hierarchies: laboratory technicians, librarians and database managers turn out to have crucial skills, subverting the common view of their jobs as marginal to knowledge production. Ideas of research excellence are also being challenged. Data management is increasingly recognised as crucial to the sustainability and impact of research, and national funders are moving away from citation counts and impact factors in evaluations.

New uses of data are forcing publishers to re-assess their business models and dissemination procedures, and research institutions are struggling to adapt their management and administration.

Data-centric science is emerging in concert with calls for increased openness in research….(More)”

Data in public health


Jeremy Berg in Science: “In 1854, physician John Snow helped curtail a cholera outbreak in a London neighborhood by mapping cases and identifying a central public water pump as the potential source. This event is considered by many to represent the founding of modern epidemiology. Data and analysis play an increasingly important role in public health today. This can be illustrated by examining the rise in the prevalence of autism spectrum disorders (ASDs), where data from varied sources highlight potential factors while ruling out others, such as childhood vaccines, facilitating wise policy choices…. A collaboration between the research community, a patient advocacy group, and a technology company (www.mss.ng) seeks to sequence the genomes of 10,000 well-phenotyped individuals from families affected by ASD, making the data freely available to researchers. Studies to date have confirmed that the genetics of autism are extremely complicated—a small number of genomic variations are closely associated with ASD, but many other variations have much lower predictive power. More than half of siblings, each of whom has ASD, have different ASD-associated variations. Future studies, facilitated by an open data approach, will no doubt help advance our understanding of this complex disorder….

A new data collection strategy was reported in 2013 to examine contagious diseases across the United States, including the impact of vaccines. Researchers digitized all available city and state notifiable disease data from 1888 to 2011, mostly from hard-copy sources. Information corresponding to nearly 88 million cases has been stored in a database that is open to interested parties without restriction (www.tycho.pitt.edu). Analyses of these data revealed that vaccine development and systematic vaccination programs have led to dramatic reductions in the number of cases. Overall, it is estimated that ∼100 million cases of serious childhood diseases have been prevented through these vaccination programs.

These examples illustrate how data collection and sharing through publication and other innovative means can drive research progress on major public health challenges. Such evidence, particularly on large populations, can help researchers and policy-makers move beyond anecdotes—which can be personally compelling, but often misleading—for the good of individuals and society….(More)”

DataRefuge


DataRefuge is a public, collaborative project designed to address the following concerns about federal climate and environmental data:

  • What are the best ways to safeguard data?
  • How do federal agencies play crucial roles in data collection, management, and distribution?
  • How do government priorities impact data’s accessibility?
  • Which projects and research fields depend on federal data?
  • Which data sets are of value to research and local communities, and why?

DataRefuge is also an initiative committed to identifying, assessing, prioritizing, securing, and distributing reliable copies of federal climate and environmental data so that it remains available to researchers. Data collected as part of the #DataRefuge initiative will be stored in multiple, trusted locations to help ensure continued accessibility.

DataRefuge acknowledges–and in fact draws attention to–the fact that there are no guarantees of perfectly safe information. But there are ways that we can create safe and trustworthy copies. DataRefuge is thus also a project to develop the best methods, practices, and protocols to do so.

DataRefuge depends on local communities. We welcome new collaborators who want to organize DataRescue Events or build DataRefuge in other ways.

There are many ways to be involved with building DataRefuge. They’re not mutually exclusive!…(More)”

Mapping open data governance models: Who makes decisions about government data and how?


Ana Brandusescu, Danny Lämmerhirt and Stefaan Verhulst call for a systematic and comparative investigation of the different governance models for open data policy and publication….

“An important value proposition behind open data involves increased transparency and accountability of governance. Yet little is known about how open data itself is governed. Who decides and how? How accountable are data holders to both the demand side and policy makers? How do data producers and actors assure the quality of government data? Who, if any, are data stewards within government tasked to make its data open?

Getting a better understanding of open data governance is not only important from an accountability point of view. If there is a better insight of the diversity of decision-making models and structures across countries, the implementation of common open data principles, such as those advocated by the International Open Data Charter, can be accelerated across countries.

In what follows, we seek to develop the initial contours of a research agenda on open data governance models. We start from the premise that different countries have different models to govern and administer their activities – in short, different ‘governance models’. Some countries are more devolved in their decision making, while others seek to organize “public administration” activities more centrally. These governance models clearly impact how open data is governed – providing a broad patchwork of different open data governance across the world and making it difficult to identify who the open data decision makers and data gatekeepers or stewards are within a given country.

For example, if one wants to accelerate the opening up of education data across borders, in some countries this may fall under the authority of sub-national government (such as states, provinces, territories or even cities), while in other countries education is governed by central government or implemented through public-private partnership arrangements. Similarly, transportation or water data may be privatised, while in other cases it may be the responsibility of municipal or regional government. Responsibilities are therefore often distributed across administrative levels and agencies affecting how (open) government data is produced, and published….(More)”

The chaos of South Africa’s taxi system is being tackled with open data


Lynsey Chutel at Quartz: “On any given day in South Africa’s cities the daily commute can be chaotic and unpredictable. A new open source data platform hopes to bring some order to that—or at least help others get it right.

Contributing to that chaos is a formal public transportation system that is inadequate for a growing urban population and an informal transportation network that whizzes through the streets unregulated. Where Is My Transport has done something unique by finally bringing these two systems together on one map.

Where Is My Transport has mapped Cape Town’s transport systems to create an integrated system, incorporating train, bus and minibus taxi routes. This last one is especially difficult, because the thousands of minibuses that ferry most South Africans are notoriously difficult to pin down.

Minibus taxis seat about 15 people and turn any corner into a bus stop, often halting traffic. They travel within neighborhoods and across the country and are the most affordable means of transport for the majority of South Africans. But they are also often unsafe vehicles, at times involved in horrific road accidents.

Devin De Vries, one of the platform’s co-founders, says he was inspired by the Digital Matatus project in Nairobi. The South African platform differs, however, in that it provides open source information for others who think they may have a solution to South Africa’s troubled public transportation system.

“Transport is a complex ecosystem, and we don’t think any one company will solve it, De Vries told Quartz. “That’s why we made our platform open and hope that many endpoints—apps, websites, et cetera—will draw on the data so people can access it.”

This could lead to trip planning apps like Moovit or Transit for African commuters, or help cities better map their public transportation system, De Vries hopes…(More)”

State of Open Corporate Data: Wins and Challenges Ahead


Sunlight Foundation: “For many people working to open data and reduce corruption, the past year could be summed up in two words: “Panama Papers.” The transcontinental investigation by a team from International Center of Investigative Journalists (ICIJ) blew open the murky world of offshore company registration. It put corporate transparency high on the agenda of countries all around the world and helped lead to some notable advances in access to official company register data….

While most companies are created and operated for legitimate economic activity,  there is a small percentage that aren’t. Entities involved in corruption, money laundering, fraud and tax evasion frequently use such companies as vehicles for their criminal activity. “The Idiot’s Guide to Money Laundering from Global Witness” shows how easy it is to use layer after layer of shell companies to hide the identity of the person who controls and benefits from the activities of the network. The World Bank’s “Puppet Masters” report found that over 70% of grand corruption cases, in fact, involved the use of offshore vehicles.

For years, OpenCorporates has advocated for company information to be in the public domain as open data, so it is usable and comparable.  It was the public reaction to Panama Papers, however, that made it clear that due diligence requires global data sets and beneficial registries are key for integrity and progress.

The call for accountability and action was clear from the aftermath of the leak. ICIJ, the journalists involved and advocates have called for tougher action on prosecutions and more transparency measures: open corporate registers and beneficial ownership registers. A series of workshops organized by the B20 showed that business also needed public beneficial ownership registers….

Last year the UK became the first country in the world to collect and publish who controls and benefits from companies in a structured format, and as open data. Just a few days later, we were able to add the information in OpenCorporates. The UK data, therefore, is one of a kind, and has been highly anticipated by transparency skeptics and advocates advocates alike. So fa,r things are looking good. 15 other countries have committed to having a public beneficial ownership register including Nigeria, Afghanistan, Germany, Indonesia, New Zealand and Norway. Denmark has announced its first public beneficial ownership data will be published in June 2017. It’s likely to be open data.

This progress isn’t limited to beneficial ownership. It is also being seen in the opening up of corporate registers . These are what OpenCorporates calls “core company data”. In 2016, more countries started releasing company register as open data, including Japan, with over 4.4 million companies, IsraelVirginiaSloveniaTexas, Singapore and Bulgaria. We’ve also had a great start to 2017 , with France publishing their central company database as open data on January 5th.

As more states have embracing open data, the USA jumped from average score of 19/100 to 30/100. Singapore rose from 0 to 20. The Slovak Republic from 20 to 40. Bulgaria wet from 35 to 90.  Japan rose from 0 to 70 — the biggest increase of the year….(More)”

Data ideologies of an interested public: A study of grassroots open government data intermediaries


 and  in Big Data & Society: “Government officials claim open data can improve internal and external communication and collaboration. These promises hinge on “data intermediaries”: extra-institutional actors that obtain, use, and translate data for the public. However, we know little about why these individuals might regard open data as a site of civic participation. In response, we draw on Ilana Gershon to conceptualize culturally situated and socially constructed perspectives on data, or “data ideologies.” This study employs mixed methodologies to examine why members of the public hold particular data ideologies and how they vary. In late 2015 the authors engaged the public through a commission in a diverse city of approximately 500,000. Qualitative data was collected from three public focus groups with residents. Simultaneously, we obtained quantitative data from surveys. Participants’ data ideologies varied based on how they perceived data to be useful for collaboration, tasks, and translations. Bucking the “geek” stereotype, only a minority of those surveyed (20%) were professional software developers or engineers. Although only a nascent movement, we argue open data intermediaries have important roles to play in a new political landscape….(More)”