Index: Open Data


By Alexandra Shaw, Michelle Winowatan, Andrew Young, and Stefaan Verhulst

The Living Library Index – inspired by the Harper’s Index – provides important statistics and highlights global trends in governance innovation. This installment focuses on open data and was originally published in 2018.

Value and Impact

  • The projected year at which all 28+ EU member countries will have a fully operating open data portal: 2020

  • Between 2016 and 2020, the market size of open data in Europe is expected to increase by 36.9%, and reach this value by 2020: EUR 75.7 billion

Public Views on and Use of Open Government Data

  • Number of Americans who do not trust the federal government or social media sites to protect their data: Approximately 50%

  • Key findings from The Economist Intelligence Unit report on Open Government Data Demand:

    • Percentage of respondents who say the key reason why governments open up their data is to create greater trust between the government and citizens: 70%

    • Percentage of respondents who say OGD plays an important role in improving lives of citizens: 78%

    • Percentage of respondents who say OGD helps with daily decision making especially for transportation, education, environment: 53%

    • Percentage of respondents who cite lack of awareness about OGD and its potential use and benefits as the greatest barrier to usage: 50%

    • Percentage of respondents who say they lack access to usable and relevant data: 31%

    • Percentage of respondents who think they don’t have sufficient technical skills to use open government data: 25%

    • Percentage of respondents who feel the number of OGD apps available is insufficient, indicating an opportunity for app developers: 20%

    • Percentage of respondents who say OGD has the potential to generate economic value and new business opportunity: 61%

    • Percentage of respondents who say they don’t trust governments to keep data safe, protected, and anonymized: 19%

Efforts and Involvement

  • Time that’s passed since open government advocates convened to create a set of principles for open government data – the instance that started the open data government movement: 10 years

  • Countries participating in the Open Government Partnership today: 79 OGP participating countries and 20 subnational governments

  • Percentage of “open data readiness” in Europe according to European Data Portal: 72%

    • Open data readiness consists of four indicators which are presence of policy, national coordination, licensing norms, and use of data.

  • Number of U.S. cities with Open Data portals: 27

  • Number of governments who have adopted the International Open Data Charter: 62

  • Number of non-state organizations endorsing the International Open Data Charter: 57

  • Number of countries analyzed by the Open Data Index: 94

  • Number of Latin American countries that do not have open data portals as of 2017: 4 total – Belize, Guatemala, Honduras and Nicaragua

  • Number of cities participating in the Open Data Census: 39

Demand for Open Data

  • Open data demand measured by frequency of open government data use according to The Economist Intelligence Unit report:

    • Australia

      • Monthly: 15% of respondents

      • Quarterly: 22% of respondents

      • Annually: 10% of respondents

    • Finland

      • Monthly: 28% of respondents

      • Quarterly: 18% of respondents

      • Annually: 20% of respondents

    •  France

      • Monthly: 27% of respondents

      • Quarterly: 17% of respondents

      • Annually: 19% of respondents

        •  
    • India

      • Monthly: 29% of respondents

      • Quarterly: 20% of respondents

      • Annually: 10% of respondents

    • Singapore

      • Monthly: 28% of respondents

      • Quarterly: 15% of respondents

      • Annually: 17% of respondents 

    • UK

      • Monthly: 23% of respondents

      • Quarterly: 21% of respondents

      • Annually: 15% of respondents

    • US

      • Monthly: 16% of respondents

      • Quarterly: 15% of respondents

      • Annually: 20% of respondents

  • Number of FOIA requests received in the US for fiscal year 2017: 818,271

  • Number of FOIA request processed in the US for fiscal year 2017: 823,222

  • Distribution of FOIA requests in 2017 among top 5 agencies with highest number of request:

    • DHS: 45%

    • DOJ: 10%

    • NARA: 7%

    • DOD: 7%

    • HHS: 4%

Examining Datasets

  • Country with highest index score according to ODB Leaders Edition: Canada (76 out of 100)

  • Country with lowest index score according to ODB Leaders Edition: Sierra Leone (22 out of 100)

  • Number of datasets open in the top 30 governments according to ODB Leaders Edition: Fewer than 1 in 5

  • Average percentage of datasets that are open in the top 30 open data governments according to ODB Leaders Edition: 19%

  • Average percentage of datasets that are open in the top 30 open data governments according to ODB Leaders Edition by sector/subject:

    • Budget: 30%

    • Companies: 13%

    • Contracts: 27%

    • Crime: 17%

    • Education: 13%

    • Elections: 17%

    • Environment: 20%

    • Health: 17%

    • Land: 7%

    • Legislation: 13%

    • Maps: 20%

    • Spending: 13%

    • Statistics: 27%

    • Trade: 23%

    • Transport: 30%

  • Percentage of countries that release data on government spending according to ODB Leaders Edition: 13%

  • Percentage of government data that is updated at regular intervals according to ODB Leaders Edition: 74%

  • Number of datasets available through:

  • Number of datasets classed as “open” in 94 places worldwide analyzed by the Open Data Index: 11%

  • Percentage of open datasets in the Caribbean, according to Open Data Census: 7%

  • Number of companies whose data is available through OpenCorporates: 158,589,950

City Open Data

  • New York City

  • Singapore

    • Number of datasets published in Singapore: 1,480

    • Percentage of datasets with standardized format: 35%

    • Percentage of datasets made as raw as possible: 25%

  • Barcelona

    • Number of datasets published in Barcelona: 443

    • Open data demand in Barcelona measured by:

      • Number of unique sessions in the month of September 2018: 5,401

    • Quality of datasets published in Barcelona according to Tim Berners Lee 5-star Open Data: 3 stars

  • London

    • Number of datasets published in London: 762

    • Number of data requests since October 2014: 325

  • Bandung

    • Number of datasets published in Bandung: 1,417

  • Buenos Aires

    • Number of datasets published in Buenos Aires: 216

  • Dubai

    • Number of datasets published in Dubai: 267

  • Melbourne

    • Number of datasets published in Melbourne: 199

Sources

  • About OGP, Open Government Partnership. 2018.  

Statistics and data science degrees: Overhyped or the real deal?


 at The Conversation“Data science” is hot right now. The number of undergraduate degrees in statistics has tripled in the past decade, and as a statistics professor, I can tell you that it isn’t because freshmen love statistics.

Way back in 2009, economist Hal Varian of Google dubbed statistician the “next sexy job.” Since then, statistician, data scientist and actuary have topped various “best jobs” lists. Not to mention the enthusiastic press coverage of industry applications: Machine learning! Big dataAIDeep learning!

But is it good advice? I’m going to voice an unpopular opinion for the sake of starting a conversation. Stats is indeed useful, but not in the way that the popular media – and all those online data science degree programs – seem to suggest….

While all the press tends to go to the sensationalist applications – computers that watch cat videos, anyone? – the data science boom reflects a broad increase in demand for data literacy, as a baseline requirement for modern jobs.

The “big data era” doesn’t just mean large amounts of data; it also means increased ease and ability to collect data of all types, in all walks of life. Although the big five tech companies – Google, Apple, Amazon, Facebook and Microsoft – represent about 10 percent of the U.S. market cap and dominate the public imagination, they employ only one-half of one percent of all employees.

Therefore, to be a true revolution, data science will need to infiltrate nontech industries. And it is. The U.S. has seen its impact on political campaigns. I myself have consulted in the medical devices sector. A few years back, Walmart held a data analysis competition as a recruiting tool. The need for people that can dig into the data and parse it is everywhere.

In a speech at the National Academy of Sciences in 2015, Steven “Freakonomics” Levitt related his insights about the need for data-savvy workers, based on his experience as a sought-after consultant in fields ranging from the airline industry to fast food….(More)”.

A Doctor’s Prescription: Data May Finally Be Good for Your Health


Interview by Art Kleiner: “In 2015, Robert Wachter published The Digital Doctor: Hope, Hype, and Harm at the Dawn of Medicine’s Computer Age, a skeptical account of digitization in hospitals. Despite the promise offered by the digital transformation of healthcare, electronic health records had not delivered better care and greater efficiency. The cumbersome design, legacy procedures, and resistance from staff were frustrating everyone — administrators, nurses, consultants, and patients. Costs continued to rise, and preventable medical mistakes were not spotted. One patient at Wachter’s own hospital, one of the nation’s finest, was given 39 times the correct dose of antibiotics by an automated system that nobody questioned. The teenager survived, but it was clear that there needed to be a new approach to the management and use of data.

Wachter has for decades considered the delivery of healthcare through a lens focused on patient safety and quality. In 1996, he coauthored a paper in the New England Journal of Medicine that coined the term hospitalist in describing and promoting a new way of managing patients in hospitals: having one doctor — the hospitalist — “own” the patient journey from admission to discharge. The primary goal was to improve outcomes and save lives. Wachter argued it would also reduce costs and increase efficiency, making the business case for better healthcare. And he was right. Today there are more than 50,000 hospitalists, and it took just two years from the article’s publication to have the first data proving his point. In 2016, Wachter was named chair of the Department of Medicine at the University of California, San Francisco (UCSF), where he has worked since 1990.

Today, Wachter is, to paraphrase the title of a recent talk, less grumpy than he used to be about health tech. The hope part of his book’s title has materialized in some areas faster than he predicted. AI’s advances in imaging are already helping the detection of cancers become more accurate. As data collection has become better systematized, big technology firms such as Google, Amazon, and Apple are entering (in Google’s case, reentering) the field and having more success focusing their problem-solving skills on healthcare issues. In his San Francisco office, Wachter sat down with strategy+businessto discuss why the healthcare system may finally be about to change….

Systems for Fresh Thinking

S+B: The changes you appreciate seem to have less to do with technological design and more to do with people getting used to the new systems, building their own variations, and making them work.
WACHTER:
 The original electronic health record was just a platform play to get the data in digital form. It didn’t do anything particularly helpful in terms of helping the physicians make better decisions or helping to connect one kind of doctor with another kind of doctor. But it was a start.

I remember that when we were starting to develop our electronic health record at UCSF, 12 or 13 years ago, I hired a physician who is now in charge of our health computer system. I said to him, “We don’t have our electronic health record in yet, but I’m pretty sure we will in seven or eight years. What will your job be when that’s done?” I actually thought once the system was fully implemented, we’d be done with the need to innovate and evolve in health IT. That, of course, was asinine.

S+B: That’s like saying to an auto mechanic, “What will your job be when we have automatic transmissions?”
WACHTER:
 Right, but even more so, because many of us saw electronic health records as the be-all and end-all of digitally facilitated medicine. But putting in the electronic health record is just step one of 10. Then you need to start connecting all the pieces, and then you add analytics that make sense of the data and make predictions. Then you build tools and apps to fit into the workflow and change the way you work.

One of my biggest epiphanies was this: When you digitize, in any industry, nobody is clever enough to actually change anything. All they know how to do is digitize the old practice. You only start seeing real progress when smart people come in, begin using the new system, and say, “Why the hell do we do it that way?” And then you start thinking freshly about the work. That’s when you have a chance to reimagine the work in a digital environment…(More)”.

Human Rights in the Big Data World


Paper by Francis Kuriakose and Deepa Iyer: “Ethical approach to human rights conceives and evaluates law through the underlying value concerns. This paper examines human rights after the introduction of big data using an ethical approach to rights. First, the central value concerns such as equity, equality, sustainability and security are derived from the history of digital technological revolution. Then, the properties and characteristics of big data are analyzed to understand emerging value concerns such as accountability, transparency, tracability, explainability and disprovability.

Using these value points, this paper argues that big data calls for two types of evaluations regarding human rights. The first is the reassessment of existing human rights in the digital sphere predominantly through right to equality and right to work. The second is the conceptualization of new digital rights such as right to privacy and right against propensity-based discrimination. The paper concludes that as we increasingly share the world with intelligence systems, these new values expand and modify the existing human rights paradigm….(More)”.

The free flow of non-personal data


Joint statement by Vice-President Ansip and Commissioner Gabriel on the European Parliament’s vote on the new EU rules facilitating the free flow of non-personal data: “The European Parliament adopted today a Regulation on the free flow of non-personal data proposed by the European Commission in September 2017. …

We welcome today’s vote at the European Parliament. A digital economy and society cannot exist without data and this Regulation concludes another key pillar of the Digital Single Market. Only if data flows freely can Europe get the best from the opportunities offered by digital progress and technologies such as artificial intelligence and supercomputers.  

This Regulation does for non-personal data what the General Data Protection Regulation has already done for personal data: free and safe movement across the European Union. 

With its vote, the European Parliament has sent a clear signal to all businesses of Europe: it makes no difference where in the EU you store and process your data – data localisation requirements within the Member States are a thing of the past. 

The new rules will provide a major boost to the European data economy, as it opens up potential for European start-ups and SMEs to create new services through cross-border data innovation. This could lead to a 4% – or €739 billion – higher EU GDP until 2020 alone. 

Together with the General Data Protection Regulation, the Regulation on the free flow of non-personal data will allow the EU to fully benefit from today’s and tomorrow’s data-based global economy.” 

Background

Since the Communication on the European Data Economy was adopted in January 2017 as part of the Digital Single Market strategy, the Commission has run a public online consultation, organised structured dialogues with Member States and has undertaken several workshops with different stakeholders. These evidence-gathering initiatives have led to the publication of an impact assessment….The Regulation on the free flow of non-personal data has no impact on the application of the General Data Protection Regulation (GDPR), as it does not cover personal data. However, the two Regulations will function together to enable the free flow of any data – personal and non-personal – thus creating a single European space for data. In the case of a mixed dataset, the GDPR provision guaranteeing free flow of personal data will apply to the personal data part of the set, and the free flow of non-personal data principle will apply to the non-personal part. …(More)”.

Open Government Data Report: Enhancing Policy Maturity for Sustainable Impact


Report by the OECD: This report provides an overview of the state of open data policies across OECD member and partner countries, based on data collected through the OECD Open Government Data survey (2013, 2014, 2016/17), country reviews and comparative analysis. The report analyses open data policies using an analytical framework that is in line with the OECD OUR data Index and the International Open Data Charter. It assesses governments’ efforts to enhance the availability, accessibility and re-use of open government data. It makes the case that beyond countries’ commitment to open up good quality government data, the creation of public value requires engaging user communities from the entire ecosystem, such as journalists, civil society organisations, entrepreneurs, major tech private companies and academia. The report also underlines how open data policies are elements of broader digital transformations, and how public sector data policies require interaction with other public sector agendas such as open government, innovation, employment, integrity, public budgeting, sustainable development, urban mobility and transport. It stresses the relevance of measuring open data impacts in order to support the business case for open government data….(More)”.

The law and ethics of big data analytics: A new role for international human rights in the search for global standards


David Nersessian at Business Horizons: “The Economist recently declared that digital information has overtaken oil as the world’s most valuable commodity. Big data technology is inherently global and borderless, yet little international consensus exists over what standards should govern its use. One source of global standards benefitting from considerable international consensus might be used to fill the gap: international human rights law.

This article considers the extent to which international human rights law operates as a legal or ethical constraint on global commercial use of big data technologies. By providing clear baseline standards that apply worldwide, human rights can help shape cultural norms—implemented as ethical practices and global policies and procedures—about what businesses should do with their information technologies. In this way, human rights could play a broad and important role in shaping business thinking about the proper handling of this increasingly valuable commodity in the modern global society…(More)”.

Translating science into business innovation: The case of open food and nutrition data hackathons


Paper by Christopher TucciGianluigi Viscusi and Heidi Gautschi: “In this article, we explore the use of hackathons and open data in corporations’ open innovation portfolios, addressing a new way for companies to tap into the creativity and innovation of early-stage startup culture, in this case applied to the food and nutrition sector. We study the first Open Food Data Hackdays, held on 10-11 February 2017 in Lausanne and Zurich. The aim of the overall project that the Hackdays event was part of was to use open food and nutrition data as a driver for business innovation. We see hackathons as a new tool in the innovation manager’s toolkit, a kind of live crowdsourcing exercise that goes beyond traditional ideation and develops a variety of prototypes and new ideas for business innovation. Companies then have the option of working with entrepreneurs and taking some of the ideas forward….(More)”.

Privacy and Interoperability Challenges Could Limit the Benefits of Education Technology


Report by Katharina Ley Best and John F. Pane: “The expansion of education technology is transforming the learning environment in classrooms, schools, school systems, online, and at home. The rise of education technology brings with it an increased opportunity for the collection and application of data, which are valuable resources for educators, schools, policymakers, researchers, and software developers.

RAND researchers examine some of the possible implications of growing data collection and availability related to education technology. Specifically, this Perspective discusses potential data infrastructure challenges that could limit data usefulness, consider data privacy implications in an education technology context, and review privacy principles that could help educators and policymakers evaluate the changing education data privacy landscape in anticipation of potential future changes to regulations and best practices….(More)”.

Open Data, Grey Data, and Stewardship: Universities at the Privacy Frontier.


Paper by Christine L. Borgman: “As universities recognize the inherent value in the data they collect and hold, they encounter unforeseen challenges in stewarding those data in ways that balance accountability, transparency, and protection of privacy, academic freedom, and intellectual property. Two parallel developments in academic data collection are converging: (1) open access requirements, whereby researchers must provide access to their data as a condition of obtaining grant funding or publishing results in journals; and (2) the vast accumulation of “grey data” about individuals in their daily activities of research, teaching, learning, services, and administration.

The boundaries between research and grey data are blurring, making it more difficult to assess the risks and responsibilities associated with any data collection. Many sets of data, both research and grey, fall outside privacy regulations such as HIPAA, FERPA, and PII. Universities are exploiting these data for research, learning analytics, faculty evaluation, strategic decisions, and other sensitive matters. Commercial entities are besieging universities with requests for access to data or for partnerships to mine them. The privacy frontier facing research universities spans open access practices, uses and misuses of data, public records requests, cyber risk, and curating data for privacy protection. This Article explores the competing values inherent in data stewardship and makes recommendations for practice by drawing on the pioneering work of the University of California in privacy and information security, data governance, and cyber risk….(More)”.