Index: Open Data


By Alexandra Shaw, Michelle Winowatan, Andrew Young, and Stefaan Verhulst

The Living Library Index – inspired by the Harper’s Index – provides important statistics and highlights global trends in governance innovation. This installment focuses on open data and was originally published in 2018.

Value and Impact

  • The projected year at which all 28+ EU member countries will have a fully operating open data portal: 2020

  • Between 2016 and 2020, the market size of open data in Europe is expected to increase by 36.9%, and reach this value by 2020: EUR 75.7 billion

Public Views on and Use of Open Government Data

  • Number of Americans who do not trust the federal government or social media sites to protect their data: Approximately 50%

  • Key findings from The Economist Intelligence Unit report on Open Government Data Demand:

    • Percentage of respondents who say the key reason why governments open up their data is to create greater trust between the government and citizens: 70%

    • Percentage of respondents who say OGD plays an important role in improving lives of citizens: 78%

    • Percentage of respondents who say OGD helps with daily decision making especially for transportation, education, environment: 53%

    • Percentage of respondents who cite lack of awareness about OGD and its potential use and benefits as the greatest barrier to usage: 50%

    • Percentage of respondents who say they lack access to usable and relevant data: 31%

    • Percentage of respondents who think they don’t have sufficient technical skills to use open government data: 25%

    • Percentage of respondents who feel the number of OGD apps available is insufficient, indicating an opportunity for app developers: 20%

    • Percentage of respondents who say OGD has the potential to generate economic value and new business opportunity: 61%

    • Percentage of respondents who say they don’t trust governments to keep data safe, protected, and anonymized: 19%

Efforts and Involvement

  • Time that’s passed since open government advocates convened to create a set of principles for open government data – the instance that started the open data government movement: 10 years

  • Countries participating in the Open Government Partnership today: 79 OGP participating countries and 20 subnational governments

  • Percentage of “open data readiness” in Europe according to European Data Portal: 72%

    • Open data readiness consists of four indicators which are presence of policy, national coordination, licensing norms, and use of data.

  • Number of U.S. cities with Open Data portals: 27

  • Number of governments who have adopted the International Open Data Charter: 62

  • Number of non-state organizations endorsing the International Open Data Charter: 57

  • Number of countries analyzed by the Open Data Index: 94

  • Number of Latin American countries that do not have open data portals as of 2017: 4 total – Belize, Guatemala, Honduras and Nicaragua

  • Number of cities participating in the Open Data Census: 39

Demand for Open Data

  • Open data demand measured by frequency of open government data use according to The Economist Intelligence Unit report:

    • Australia

      • Monthly: 15% of respondents

      • Quarterly: 22% of respondents

      • Annually: 10% of respondents

    • Finland

      • Monthly: 28% of respondents

      • Quarterly: 18% of respondents

      • Annually: 20% of respondents

    •  France

      • Monthly: 27% of respondents

      • Quarterly: 17% of respondents

      • Annually: 19% of respondents

        •  
    • India

      • Monthly: 29% of respondents

      • Quarterly: 20% of respondents

      • Annually: 10% of respondents

    • Singapore

      • Monthly: 28% of respondents

      • Quarterly: 15% of respondents

      • Annually: 17% of respondents 

    • UK

      • Monthly: 23% of respondents

      • Quarterly: 21% of respondents

      • Annually: 15% of respondents

    • US

      • Monthly: 16% of respondents

      • Quarterly: 15% of respondents

      • Annually: 20% of respondents

  • Number of FOIA requests received in the US for fiscal year 2017: 818,271

  • Number of FOIA request processed in the US for fiscal year 2017: 823,222

  • Distribution of FOIA requests in 2017 among top 5 agencies with highest number of request:

    • DHS: 45%

    • DOJ: 10%

    • NARA: 7%

    • DOD: 7%

    • HHS: 4%

Examining Datasets

  • Country with highest index score according to ODB Leaders Edition: Canada (76 out of 100)

  • Country with lowest index score according to ODB Leaders Edition: Sierra Leone (22 out of 100)

  • Number of datasets open in the top 30 governments according to ODB Leaders Edition: Fewer than 1 in 5

  • Average percentage of datasets that are open in the top 30 open data governments according to ODB Leaders Edition: 19%

  • Average percentage of datasets that are open in the top 30 open data governments according to ODB Leaders Edition by sector/subject:

    • Budget: 30%

    • Companies: 13%

    • Contracts: 27%

    • Crime: 17%

    • Education: 13%

    • Elections: 17%

    • Environment: 20%

    • Health: 17%

    • Land: 7%

    • Legislation: 13%

    • Maps: 20%

    • Spending: 13%

    • Statistics: 27%

    • Trade: 23%

    • Transport: 30%

  • Percentage of countries that release data on government spending according to ODB Leaders Edition: 13%

  • Percentage of government data that is updated at regular intervals according to ODB Leaders Edition: 74%

  • Number of datasets available through:

  • Number of datasets classed as “open” in 94 places worldwide analyzed by the Open Data Index: 11%

  • Percentage of open datasets in the Caribbean, according to Open Data Census: 7%

  • Number of companies whose data is available through OpenCorporates: 158,589,950

City Open Data

  • New York City

  • Singapore

    • Number of datasets published in Singapore: 1,480

    • Percentage of datasets with standardized format: 35%

    • Percentage of datasets made as raw as possible: 25%

  • Barcelona

    • Number of datasets published in Barcelona: 443

    • Open data demand in Barcelona measured by:

      • Number of unique sessions in the month of September 2018: 5,401

    • Quality of datasets published in Barcelona according to Tim Berners Lee 5-star Open Data: 3 stars

  • London

    • Number of datasets published in London: 762

    • Number of data requests since October 2014: 325

  • Bandung

    • Number of datasets published in Bandung: 1,417

  • Buenos Aires

    • Number of datasets published in Buenos Aires: 216

  • Dubai

    • Number of datasets published in Dubai: 267

  • Melbourne

    • Number of datasets published in Melbourne: 199

Sources

  • About OGP, Open Government Partnership. 2018.  

Can a set of equations keep U.S. census data private?


Jeffrey Mervis at Science: “The U.S. Census Bureau is making waves among social scientists with what it calls a “sea change” in how it plans to safeguard the confidentiality of data it releases from the decennial census.

The agency announced in September 2018 that it will apply a mathematical concept called differential privacy to its release of 2020 census data after conducting experiments that suggest current approaches can’t assure confidentiality. But critics of the new policy believe the Census Bureau is moving too quickly to fix a system that isn’t broken. They also fear the changes will degrade the quality of the information used by thousands of researchers, businesses, and government agencies.

The move has implications that extend far beyond the research community. Proponents of differential privacy say a fierce, ongoing legal battle over plans to add a citizenship question to the 2020 census has only underscored the need to assure people that the government will protect their privacy....

Differential privacy, first described in 2006, isn’t a substitute for swapping and other ways to perturb the data. Rather, it allows someone—in this case, the Census Bureau—to measure the likelihood that enough information will “leak” from a public data set to open the door to reconstruction.

“Any time you release a statistic, you’re leaking something,” explains Jerry Reiter, a professor of statistics at Duke University in Durham, North Carolina, who has worked on differential privacy as a consultant with the Census Bureau. “The only way to absolutely ensure confidentiality is to release no data. So the question is, how much risk is OK? Differential privacy allows you to put a boundary” on that risk....

In the case of census data, however, the agency has already decided what information it will release, and the number of queries is unlimited. So its challenge is to calculate how much the data must be perturbed to prevent reconstruction....

A professor of labor economics at Cornell University, Abowd first learned that traditional procedures to limit disclosure were vulnerable—and that algorithms existed to quantify the risk—at a 2005 conference on privacy attended mainly by cryptographers and computer scientists. “We were speaking different languages, and there was no Rosetta Stone,” he says.

He took on the challenge of finding common ground. In 2008, building on a long relationship with the Census Bureau, he and a team at Cornell created the first application of differential privacy to a census product. It is a web-based tool, called OnTheMap, that shows where people work and live….

The three-step process required substantial computing power. First, the researchers reconstructed records for individuals—say, a 55-year-old Hispanic woman—by mining the aggregated census tables. Then, they tried to match the reconstructed individuals to even more detailed census block records (that still lacked names or addresses); they found “putative matches” about half the time.

Finally, they compared the putative matches to commercially available credit databases in hopes of attaching a name to a particular record. Even if they could, however, the team didn’t know whether they had actually found the right person.

Abowd won’t say what proportion of the putative matches appeared to be correct. (He says a forthcoming paper will contain the ratio, which he calls “the amount of uncertainty an attacker would have once they claim to have reidentified a person from the public data.”) Although one of Abowd’s recent papers notes that “the risk of re-identification is small,” he believes the experiment proved reidentification “can be done.” And that, he says, “is a strong motivation for moving to differential privacy.”…

Such arguments haven’t convinced Ruggles and other social scientists opposed to applying differential privacy on the 2020 census. They are circulating manuscripts that question the significance of the census reconstruction exercise and that call on the agency to delay and change its plan....

Ruggles, meanwhile, has spent a lot of time thinking about the kinds of problems differential privacy might create. His Minnesota institute, for instance, disseminates data from the Census Bureau and 105 other national statistical agencies to 176,000 users. And he fears differential privacy will put a serious crimp in that flow of information…

There are also questions of capacity and accessibility. The centers require users to do all their work onsite, so researchers would have to travel, and the centers offer fewer than 300 workstations in total....

Abowd has said, “The deployment of differential privacy within the Census Bureau marks a sea change for the way that official statistics are produced and published.” And Ruggles agrees. But he says the agency hasn’t done enough to equip researchers with the maps and tools needed to navigate the uncharted waters….(More)”.

The Paradox of Police Data


Stacy Wood in KULA: knowledge creation, dissemination, and preservation studies: “This paper considers the history and politics of ‘police data.’ Police data, I contend, is a category of endangered data reliant on voluntary and inconsistent reporting by law enforcement agencies; it is also inconsistently described and routinely housed in systems that were not designed with long-term strategies for data preservation, curation or management in mind. Moreover, whereas US law enforcement agencies have, for over a century, produced and published a great deal of data about crime, data about the ways in which police officers spend their time and make decisions about resources—as well as information about patterns of individual officer behavior, use of force, and in-custody deaths—is difficult to find. This presents a paradoxical situation wherein vast stores of extant data are completely inaccessible to the public. This paradoxical state is not new, but the continuation of a long history co-constituted by technologies, epistemologies and context….(More)”.

In High-Tech Cities, No More Potholes, but What About Privacy?


Timothy Williams in The New York Times: “Hundreds of cities, large and small, have adopted or begun planning smart cities projects. But the risks are daunting. Experts say cities frequently lack the expertise to understand privacy, security and financial implications of such arrangements. Some mayors acknowledge that they have yet to master the responsibilities that go along with collecting billions of bits of data from residents….

Supporters of “smart cities” say that the potential is enormous and that some projects could go beyond creating efficiencies and actually save lives. Among the plans under development are augmented reality programs that could help firefighters find people trapped in burning buildings and the collection of sewer samples by robots to determine opioid use so that city services could be aimed at neighborhoods most in need.

The hazards are also clear.

“Cities don’t know enough about data, privacy or security,” said Lee Tien, a lawyer at the Electronic Frontier Foundation, a nonprofit organization focused on digital rights. “Local governments bear the brunt of so many duties — and in a lot of these cases, they are often too stupid or too lazy to talk to people who know.”

Cities habitually feel compelled to outdo each other, but the competition has now been intensified by lobbying from tech companies and federal inducements to modernize.

“There is incredible pressure on an unenlightened city to be a ‘smart city,’” said Ben Levine, executive director at MetroLab Network, a nonprofit organization that helps cities adapt to technology change.

That has left Washington, D.C., and dozens of other cities testing self-driving cars and Orlando trying to harness its sunshine to power electric vehicles. San Francisco has a system that tracks bicycle traffic, while Palm Beach, Fla., uses cycling data to decide where to send street sweepers. Boise, Idaho, monitors its trash dumps with drones. Arlington, Tex., is looking at creating a transit system based on data from ride-sharing apps….(More)”.

A Research Roadmap to Advance Data Collaboratives Practice as a Novel Research Direction


Iryna Susha, Theresa A. Pardo, Marijn Janssen, Natalia Adler, Stefaan G. Verhulst and Todd Harbour in the  International Journal of Electronic Government Research (IJEGR): “An increasing number of initiatives have emerged around the world to help facilitate data sharing and collaborations to leverage different sources of data to address societal problems. They are called “data collaboratives”. Data collaboratives are seen as a novel way to match real life problems with relevant expertise and data from across the sectors. Despite its significance and growing experimentation by practitioners, there has been limited research in this field. In this article, the authors report on the outcomes of a panel discussing critical issues facing data collaboratives and develop a research and development agenda. The panel included participants from the government, academics, and practitioners and was held in June 2017 during the 18th International Conference on Digital Government Research at City University of New York (Staten Island, New York, USA). The article begins by discussing the concept of data collaboratives. Then the authors formulate research questions and topics for the research roadmap based on the panel discussions. The research roadmap poses questions across nine different topics: conceptualizing data collaboratives, value of data, matching data to problems, impact analysis, incentives, capabilities, governance, data management, and interoperability. Finally, the authors discuss how digital government research can contribute to answering some of the identified research questions….(More)”. See also: http://datacollaboratives.org/

Government Information in Canada: Access and Stewardship


Book edited by Amanda Wakaruk and Sam-chin Li: “Government information is not something that most people think about until they need it or see it in a headline. Indeed, even then librarians, journalists, and intellectually curious citizens will rarely recognize or identify that the statistics needed to complete a report, or the scandal-breaking evidence behind a politician’s resignation, was sourced from taxpayer-funded publications and documents. Fewer people will likely appreciate the fact that access to government information is a requirement of a democratic society.

Government Information in Canada introduces the average librarian, journalist, researcher, and intellectually curious citizen to the often complex, rarely obvious, and sometimes elusive foundational element of a liberal democracy: publicly accessible government information.

While our primary goal is to provide an overview of the state of access to Canadian government information in the late-twentieth and early twenty-first centuries, we hope that this work will also encourage its readers to become more active in the government information community by contributing to government consultations and seeking out information that is produced by their governing bodies. ….

One of our goals is to document the state of government information in Canada at a point of transition. To help orient readers to today’s sub-discipline of librarianship, we offer four points that have been observed and learned over decades of working with government information in academic environments.

  1. Access to government information is the foundation of a functioning democracy and underpins informed citizen engagement. Government information allows us to assess our governing bodies — access that is required for a democracy to function.
  2. Government information has enduring value. The work of countless academics and other experts is disseminated via government information. Government publications and documents are used by academics and social commentators in all areas of intellectual output, resulting in the production of books, reports, speeches, and so forth, which have shaped our society and understanding of the world. For example, the book that introduced the public to the science of climate change, Silent Spring, was full of references to government information; furthermore, legal scholars, lawyers, and judges use legislative documents to interpret and apply the law; journalists use government documents to inform the electorate about their governing bodies. Government information is precarious and requires stewardship.
  3. The strongest system of stewardship for government information is one that operates in partnership with, and at arm’s length of, author agencies. Most content is digital, but this does not mean that it is posted and openly available online. Furthermore, content made available online does not necessarily remain accessible to the public.
  4. Government publications and documents are different from most books, journals, and content born on the Internet. Government information does not fit into the traditional dissemination channels developed and simplified through customer feedback and the pursuit of higher profits. The agencies that produce government information are motivated by different factors than those of traditional publishers…(More)”.

Cities, Government, Law, and Civil Society


Paper by Heidi Li Feldman: “For too long, legal commentators have developed accounts of law, governments and civil society, and rights to access that society, from a national-federal perspective. As Americans increasingly live in cities, it is time for legal theorists to concentrate on municipalities as the locus of civil society. From an American national-federal perspective, government and law play primarily a remedial role with regard to civil society, stepping in only to resolve great inequities, usually by creating legally recognized civil rights and enforcing them. Civil society and civil rights, however, exceed this cramped national-federal window on it. Throughout the United States today, civil society is a multi-faceted arena for social coordination and social cooperation, for consonant and collective action of many different kinds. The only reason civil rights and the legal protection of them matters is because participation in civil society is makes it possible for individuals to engage in all manner of activities that are useful, enjoyable, and worthwhile. In other words, the significance of civil rights follows from the existence of a civil society worth participating in. To the extent that government can and does make civil society viable and valuable, it is an integral part of civil society. That feature gets lost in a remedial account of the relationship between government, law, and civil society. 

Perhaps the role of cities in civil society has been neglected by the legal academy because cities are not sovereigns. Sovereignty has often been the issue that provokes theoretical attention to government and its role in civil life. At the heart of the federal-national account of civil society and government is the potential threat the sovereign poses to other actors in civil society. But there is no necessary connection between concentrating on the nature and workings of sovereignty and considering the role for government and law in civil society. And when a government is not a sovereign, its ability to threaten is inherently constrained. That is what examining cities, non-sovereign governments embedded in a web of other governments, shows us. 

When we turn our attention to cities, a very different role for government and law emerges. Cities often exemplify how government and law can enable civil society and all those encompassed by it. They show how government can promote and amplify collective action, not only at the local level but even at the international one. In the United States today, governments can and do provide resources for consonant and collective action even in nongovernmental settings. Governments also coordinate and cooperate alongside fellow actors such as citizen activist groups, small and large businesses, labor unions, universities and colleges, and other nongovernmental organizations. This is particularly apparent at the local level. By delving into local government, we gain a distinctive perspective on the intersection of government and law, on one hand, and civil society, on the other — on what that intersection does, can, and should be like. This paper develops a first iteration of a locality centered account of civil society and the role for government and law within it. I examine a particular municipality, the City of Pittsburgh, to provide a concrete example from which to generate ideas and judgements about the terrain and content of this localist account….(More)”.

Google Searches Could Predict Heroin Overdoses


Rod McCullom at Scientific American: “About 115 people nationwide die every day from opioid overdoses, according to the U.S. Centers for Disease Control and Prevention. A lack of timely, granular data exacerbates the crisis; one study showed opioid deaths were undercounted by as many as 70,000 between 1999 and 2015, making it difficult for governments to respond. But now Internet searches have emerged as a data source to predict overdose clusters in cities or even specific neighborhoods—information that could aid local interventions that save lives. 

The working hypothesis was that some people searching for information on heroin and other opioids might overdose in the near future. To test this, a researcher at the University of California Institute for Prediction Technology (UCIPT) and his colleagues developed several statistical models to forecast overdoses based on opioid-related keywords, metropolitan income inequality and total number of emergency room visits. They discovered regional differences (graphic) in where and how people searched for such information and found that more overdoses were associated with a greater number of searches per keyword. The best-fitting model, the researchers say, explained about 72 percent of the relation between the most popular search terms and heroin-related E.R. visits. The authors say their study, published in the September issue of Drug and Alcohol Dependence, is the first report of using Google searches in this way. 

To develop their models, the researchers obtained search data for 12 prescription and nonprescription opioids between 2005 and 2011 in nine U.S. metropolitan areas. They compared these with Substance Abuse and Mental Health Services Administration records of heroin-related E.R. admissions during the same period. The models can be modified to predict overdoses of other opioids or narrow searches to specific zip codes, says lead study author Sean D. Young, a behavioral psychologist and UCIPT executive director. That could provide early warnings of overdose clusters and help to decide where to distribute the overdose reversal medication Naloxone….(More)”.

Congress passes ‘Open Government Data Act’ to make open data part of the US Code


Melisha Dsouza at Packt>: “22nd December marked a win for U.S. government in terms of efficiency, accountability, and transparency of open data. Following the Senate vote held on 19th December, Congress passed the Foundations for Evidence-Based Policymaking (FEBP) Act (H.R. 4174, S. 2046). Title II of this package is the Open, Public, Electronic and Necessary (OPEN) Government Data Act, which requires all non-sensitive government data to be made available in open and machine-readable formats by default.

The federal government possesses a huge amount of public data which should ideally be used to improve government services and promote private sector innovation. The open data proposal will mandate that federal agencies publish their information online, using machine-readable data formats.

Here are some of the key points that the Open Government Data Act seeks to do:

  • Define open data without locking in yesterday’s technology.
  • Create minimal standards for making federal government data available to the public.
  • Require the federal government to use open data for better decision making.
  • Ensure accountability by requiring regular oversight.
  • Establish and formalize Chief Data Officers (CDO) at federal agencies with data governance and implementation responsibilities.
  • Agencies need to maintain and publish a comprehensive data inventory of all data assets to help open data advocates identify key government information resources and transform them from documents and siloed databases into open data….(More)”.

For a more extensive discussion see: Congress votes to make open government data the default in the United States by Alex Howard.

It’s time for a Bill of Data Rights


Article by Martin Tisne: “…The proliferation of data in recent decades has led some reformers to a rallying cry: “You own your data!” Eric Posner of the University of Chicago, Eric Weyl of Microsoft Research, and virtual-reality guru Jaron Lanier, among others, argue that data should be treated as a possession. Mark Zuckerberg, the founder and head of Facebook, says so as well. Facebook now says that you “own all of the contact and information you post on Facebook” and “can control how it is shared.” The Financial Times argues that “a key part of the answer lies in giving consumers ownership of their own personal data.” In a recent speech, Tim Cook, Apple’s CEO, agreed, saying, “Companies should recognize that data belongs to users.”

This essay argues that “data ownership” is a flawed, counterproductive way of thinking about data. It not only does not fix existing problems; it creates new ones. Instead, we need a framework that gives people rights to stipulate how their data is used without requiring them to take ownership of it themselves….

The notion of “ownership” is appealing because it suggests giving you power and control over your data. But owning and “renting” out data is a bad analogy. Control over how particular bits of data are used is only one problem among many. The real questions are questions about how data shapes society and individuals. Rachel’s story will show us why data rights are important and how they might work to protect not just Rachel as an individual, but society as a whole.

Tomorrow never knows

To see why data ownership is a flawed concept, first think about this article you’re reading. The very act of opening it on an electronic device created data—an entry in your browser’s history, cookies the website sent to your browser, an entry in the website’s server log to record a visit from your IP address. It’s virtually impossible to do anything online—reading, shopping, or even just going somewhere with an internet-connected phone in your pocket—without leaving a “digital shadow” behind. These shadows cannot be owned—the way you own, say, a bicycle—any more than can the ephemeral patches of shade that follow you around on sunny days.

Your data on its own is not very useful to a marketer or an insurer. Analyzed in conjunction with similar data from thousands of other people, however, it feeds algorithms and bucketizes you (e.g., “heavy smoker with a drink habit” or “healthy runner, always on time”). If an algorithm is unfair—if, for example, it wrongly classifies you as a health risk because it was trained on a skewed data set or simply because you’re an outlier—then letting you “own” your data won’t make it fair. The only way to avoid being affected by the algorithm would be to never, ever give anyone access to your data. But even if you tried to hoard data that pertains to you, corporations and governments with access to large amounts of data about other people could use that data to make inferences about you. Data is not a neutral impression of reality. The creation and consumption of data reflects how power is distributed in society. …(More)”.