Index: Open Data


By Alexandra Shaw, Michelle Winowatan, Andrew Young, and Stefaan Verhulst

The Living Library Index – inspired by the Harper’s Index – provides important statistics and highlights global trends in governance innovation. This installment focuses on open data and was originally published in 2018.

Value and Impact

  • The projected year at which all 28+ EU member countries will have a fully operating open data portal: 2020

  • Between 2016 and 2020, the market size of open data in Europe is expected to increase by 36.9%, and reach this value by 2020: EUR 75.7 billion

Public Views on and Use of Open Government Data

  • Number of Americans who do not trust the federal government or social media sites to protect their data: Approximately 50%

  • Key findings from The Economist Intelligence Unit report on Open Government Data Demand:

    • Percentage of respondents who say the key reason why governments open up their data is to create greater trust between the government and citizens: 70%

    • Percentage of respondents who say OGD plays an important role in improving lives of citizens: 78%

    • Percentage of respondents who say OGD helps with daily decision making especially for transportation, education, environment: 53%

    • Percentage of respondents who cite lack of awareness about OGD and its potential use and benefits as the greatest barrier to usage: 50%

    • Percentage of respondents who say they lack access to usable and relevant data: 31%

    • Percentage of respondents who think they don’t have sufficient technical skills to use open government data: 25%

    • Percentage of respondents who feel the number of OGD apps available is insufficient, indicating an opportunity for app developers: 20%

    • Percentage of respondents who say OGD has the potential to generate economic value and new business opportunity: 61%

    • Percentage of respondents who say they don’t trust governments to keep data safe, protected, and anonymized: 19%

Efforts and Involvement

  • Time that’s passed since open government advocates convened to create a set of principles for open government data – the instance that started the open data government movement: 10 years

  • Countries participating in the Open Government Partnership today: 79 OGP participating countries and 20 subnational governments

  • Percentage of “open data readiness” in Europe according to European Data Portal: 72%

    • Open data readiness consists of four indicators which are presence of policy, national coordination, licensing norms, and use of data.

  • Number of U.S. cities with Open Data portals: 27

  • Number of governments who have adopted the International Open Data Charter: 62

  • Number of non-state organizations endorsing the International Open Data Charter: 57

  • Number of countries analyzed by the Open Data Index: 94

  • Number of Latin American countries that do not have open data portals as of 2017: 4 total – Belize, Guatemala, Honduras and Nicaragua

  • Number of cities participating in the Open Data Census: 39

Demand for Open Data

  • Open data demand measured by frequency of open government data use according to The Economist Intelligence Unit report:

    • Australia

      • Monthly: 15% of respondents

      • Quarterly: 22% of respondents

      • Annually: 10% of respondents

    • Finland

      • Monthly: 28% of respondents

      • Quarterly: 18% of respondents

      • Annually: 20% of respondents

    •  France

      • Monthly: 27% of respondents

      • Quarterly: 17% of respondents

      • Annually: 19% of respondents

        •  
    • India

      • Monthly: 29% of respondents

      • Quarterly: 20% of respondents

      • Annually: 10% of respondents

    • Singapore

      • Monthly: 28% of respondents

      • Quarterly: 15% of respondents

      • Annually: 17% of respondents 

    • UK

      • Monthly: 23% of respondents

      • Quarterly: 21% of respondents

      • Annually: 15% of respondents

    • US

      • Monthly: 16% of respondents

      • Quarterly: 15% of respondents

      • Annually: 20% of respondents

  • Number of FOIA requests received in the US for fiscal year 2017: 818,271

  • Number of FOIA request processed in the US for fiscal year 2017: 823,222

  • Distribution of FOIA requests in 2017 among top 5 agencies with highest number of request:

    • DHS: 45%

    • DOJ: 10%

    • NARA: 7%

    • DOD: 7%

    • HHS: 4%

Examining Datasets

  • Country with highest index score according to ODB Leaders Edition: Canada (76 out of 100)

  • Country with lowest index score according to ODB Leaders Edition: Sierra Leone (22 out of 100)

  • Number of datasets open in the top 30 governments according to ODB Leaders Edition: Fewer than 1 in 5

  • Average percentage of datasets that are open in the top 30 open data governments according to ODB Leaders Edition: 19%

  • Average percentage of datasets that are open in the top 30 open data governments according to ODB Leaders Edition by sector/subject:

    • Budget: 30%

    • Companies: 13%

    • Contracts: 27%

    • Crime: 17%

    • Education: 13%

    • Elections: 17%

    • Environment: 20%

    • Health: 17%

    • Land: 7%

    • Legislation: 13%

    • Maps: 20%

    • Spending: 13%

    • Statistics: 27%

    • Trade: 23%

    • Transport: 30%

  • Percentage of countries that release data on government spending according to ODB Leaders Edition: 13%

  • Percentage of government data that is updated at regular intervals according to ODB Leaders Edition: 74%

  • Number of datasets available through:

  • Number of datasets classed as “open” in 94 places worldwide analyzed by the Open Data Index: 11%

  • Percentage of open datasets in the Caribbean, according to Open Data Census: 7%

  • Number of companies whose data is available through OpenCorporates: 158,589,950

City Open Data

  • New York City

  • Singapore

    • Number of datasets published in Singapore: 1,480

    • Percentage of datasets with standardized format: 35%

    • Percentage of datasets made as raw as possible: 25%

  • Barcelona

    • Number of datasets published in Barcelona: 443

    • Open data demand in Barcelona measured by:

      • Number of unique sessions in the month of September 2018: 5,401

    • Quality of datasets published in Barcelona according to Tim Berners Lee 5-star Open Data: 3 stars

  • London

    • Number of datasets published in London: 762

    • Number of data requests since October 2014: 325

  • Bandung

    • Number of datasets published in Bandung: 1,417

  • Buenos Aires

    • Number of datasets published in Buenos Aires: 216

  • Dubai

    • Number of datasets published in Dubai: 267

  • Melbourne

    • Number of datasets published in Melbourne: 199

Sources

  • About OGP, Open Government Partnership. 2018.  

The free flow of non-personal data


Joint statement by Vice-President Ansip and Commissioner Gabriel on the European Parliament’s vote on the new EU rules facilitating the free flow of non-personal data: “The European Parliament adopted today a Regulation on the free flow of non-personal data proposed by the European Commission in September 2017. …

We welcome today’s vote at the European Parliament. A digital economy and society cannot exist without data and this Regulation concludes another key pillar of the Digital Single Market. Only if data flows freely can Europe get the best from the opportunities offered by digital progress and technologies such as artificial intelligence and supercomputers.  

This Regulation does for non-personal data what the General Data Protection Regulation has already done for personal data: free and safe movement across the European Union. 

With its vote, the European Parliament has sent a clear signal to all businesses of Europe: it makes no difference where in the EU you store and process your data – data localisation requirements within the Member States are a thing of the past. 

The new rules will provide a major boost to the European data economy, as it opens up potential for European start-ups and SMEs to create new services through cross-border data innovation. This could lead to a 4% – or €739 billion – higher EU GDP until 2020 alone. 

Together with the General Data Protection Regulation, the Regulation on the free flow of non-personal data will allow the EU to fully benefit from today’s and tomorrow’s data-based global economy.” 

Background

Since the Communication on the European Data Economy was adopted in January 2017 as part of the Digital Single Market strategy, the Commission has run a public online consultation, organised structured dialogues with Member States and has undertaken several workshops with different stakeholders. These evidence-gathering initiatives have led to the publication of an impact assessment….The Regulation on the free flow of non-personal data has no impact on the application of the General Data Protection Regulation (GDPR), as it does not cover personal data. However, the two Regulations will function together to enable the free flow of any data – personal and non-personal – thus creating a single European space for data. In the case of a mixed dataset, the GDPR provision guaranteeing free flow of personal data will apply to the personal data part of the set, and the free flow of non-personal data principle will apply to the non-personal part. …(More)”.

Revisiting the governance of privacy: Contemporary policy instruments in global perspective


Colin J. Bennett and Charles D. Raab at Regulation & Governance: “The repertoire of policy instruments within a particular policy sector varies by jurisdiction; some “tools of government” are associated with particular administrative and regulatory traditions and political cultures. It is less clear how the instruments associated with a particular policy sector may change over time, as economic, social, and technological conditions evolve.

In the early 2000s, we surveyed and analyzed the global repertoire of policy instruments deployed to protect personal data. In this article, we explore how those instruments have changed as a result of 15 years of social, economic and technological transformations, during which the issue has assumed a far higher global profile, as one of the central policy questions associated with modern networked communications.

We review the contemporary range of transnational, regulatory, self‐regulatory, and technical instruments according to the same framework, and conclude that the types of policy instrument have remained relatively stable, even though they are now deployed on a global scale.

While the labels remain the same, however, the conceptual foundations for their legitimation and justification are shifting as greater emphases on accountability, risk, ethics, and the social/political value of privacy have gained purchase. Our analysis demonstrates both continuity and change within the governance of privacy, and displays how we would have tackled the same research project today.

As a broader case study of regulation, it highlights the importance of going beyond technical and instrumental labels. Change or stability of policy instruments does not take place in isolation from the wider conceptualizations that shape their meaning, purpose, and effect…(More)”.

Don’t forget people in the use of big data for development


Joshua Blumenstock at Nature: “Today, 95% of the global population has mobile-phone coverage, and the number of people who own a phone is rising fast (see ‘Dialling up’)1. Phones generate troves of personal data on billions of people, including those who live on a few dollars a day. So aid organizations, researchers and private companies are looking at ways in which this ‘data revolution’ could transform international development.

Some businesses are starting to make their data and tools available to those trying to solve humanitarian problems. The Earth-imaging company Planet in San Francisco, California, for example, makes its high-resolution satellite pictures freely available after natural disasters so that researchers and aid organizations can coordinate relief efforts. Meanwhile, organizations such as the World Bank and the United Nations are recruiting teams of data scientists to apply their skills in statistics and machine learning to challenges in international development.

But in the rush to find technological solutions to complex global problems there’s a danger of researchers and others being distracted by the technology and losing track of the key hardships and constraints that are unique to each local context. Designing data-enabled applications that work in the real world will require a slower approach that pays much more attention to the people behind the numbers…(More)”.

Sharing the benefits: How to use data effectively in the public sector


Report by Sarah Timmis, Luke Heselwood and Eleonora Harwich (for Reform UK): “This report demonstrates the potential of data sharing to transform the delivery of public services and improve outcomes for citizens. It explores how government can overcome various challenges to ‘get data right’ and enable better use of personal data within and between public-sector organisations.

Ambition meets reality

Government is set on using data more effectively to help deliver better public services. Better use of data can improve the design, efficiency and outcomes of services. For example, sharing data digitally between GPs and hospitals can enable early identification of patients most at risk of hospital admission, which has reduced admissions by up to 30 per cent in Somerset. Bristol’s Homeless Health Service allows access to medical, psychiatric, social and prison data, helping to provide a clearer picture of the complex issues facing the city’s homeless population. However, government has not yet created a clear data infrastructure, which would allow data to be shared across multiple public services, meaning efforts on the ground have not always delivered results.

The data: sticking points

Several technical challenges must be overcome to create the right data infrastructure. Individual pieces of data must be presented in standard formats to enable sharing within and across services. Data quality can be improved at the point of data collection, through better monitoring of data quality and standards within public-sector organisations and through data-curation-processes. Personal data also needs to be presented in a given format so linking data is possible in certain instances to identify individuals. Interoperability issues and legacy systems act as significant barriers to data linking. The London Metropolitan Police alone use 750 different systems, many of which are incompatible. Technical solutions, such as Application Programming Interfaces (APIs) can be overlaid on top of legacy systems to improve interoperability and enable data sharing. However, this is only possible with the right standards and a solid new data model. To encourage competition and improve interoperability in the longer term, procurement rules should make interoperability a prerequisite for competing companies, allowing customers to integrate their choices of the most appropriate products from different vendors.

Building trustworthiness

The ability to share data at scale through the internet has brought new threats to the security and privacy of personal information that amplifies the need for trust between government and citizens and across government departments. Currently, just 9 per cent of people feel that the Government has their best interests at heart when data sharing, and only 15 per cent are confident that government organisations would deal well with a cyber-attack. Considering attitudes towards data sharing are time and context dependent, better engagement with citizens and clearer explanations of when and why data is used can help build confidence. Auditability is also key to help people and organisations track how data is used to ensure every interaction with personal data is auditable, transparent and secure. …(More)”.

Trust, Security, and Privacy in Crowdsourcing


Guest Editorial to Special Issue of IEEE Internet of Things Journal: “As we become increasingly reliant on intelligent, interconnected devices in every aspect of our lives, critical trust, security, and privacy concerns are raised as well.

First, the sensing data provided by individual participants is not always reliable. It may be noisy or even faked due to various reasons, such as poor sensor quality, lack of sensor calibration, background noise, context impact, mobility, incomplete view of observations, or malicious attacks. The crowdsourcing applications should be able to evaluate the trustworthiness of collected data in order to filter out the noisy and fake data that may disturb or intrude a crowdsourcing system. Second, providing data (e.g., photographs taken with personal mobile devices) or using IoT applications may compromise data providers’ personal data privacy (e.g., location, trajectory, and activity privacy) and identity privacy. Therefore, it becomes essential to assess the trust of the data while preserving the data providers’ privacy. Third, data analytics and mining in crowdsourcing may disclose the privacy of data providers or related entities to unauthorized parities, which lowers the willingness of participants to contribute to the crowdsourcing system, impacts system acceptance, and greatly impedes its further development. Fourth, the identities of data providers could be forged by malicious attackers to intrude the whole crowdsourcing system. In this context, trust, security, and privacy start to attract a special attention in order to achieve high quality of service in each step of crowdsourcing with regard to data collection, transmission, selection, processing, analysis and mining, as well as utilization.

Trust, security, and privacy in crowdsourcing receives increasing attention. Many methods have been proposed to protect privacy in the process of data collection and processing. For example, data perturbation can be adopted to hide the real data values during data collection. When preprocessing the collected data, data anonymization (e.g., k-anonymization) and fusion can be applied to break the links between the data and their sources/providers. In application layer, anonymity is used to mask the real identities of data sources/providers. To enable privacy-preserving data mining, secure multiparty computation (SMC) and homomorphic encryption provide options for protecting raw data when multiple parties jointly run a data mining algorithm. Through cryptographic techniques, no party knows anything else than its own input and expected results. For data truth discovery, applicable solutions include correlation-based data quality analysis and trust evaluation of data sources. But current solutions are still imperfect, incomprehensive, and inefficient….(More)”.

Countries Can Learn from France’s Plan for Public Interest Data and AI


Nick Wallace at the Center for Data Innovation: “French President Emmanuel Macron recently endorsed a national AI strategy that includes plans for the French state to make public and private sector datasets available for reuse by others in applications of artificial intelligence (AI) that serve the public interest, such as for healthcare or environmental protection. Although this strategy fails to set out how the French government should promote widespread use of AI throughout the economy, it will nevertheless give a boost to AI in some areas, particularly public services. Furthermore, the plan for promoting the wider reuse of datasets, particularly in areas where the government already calls most of the shots, is a practical idea that other countries should consider as they develop their own comprehensive AI strategies.

The French strategy, drafted by mathematician and Member of Parliament Cédric Villani, calls for legislation to mandate repurposing both public and private sector data, including personal data, to enable public-interest uses of AI by government or others, depending on the sensitivity of the data. For example, public health services could use data generated by Internet of Things (IoT) devices to help doctors better treat and diagnose patients. Researchers could use data captured by motorway CCTV to train driverless cars. Energy distributors could manage peaks and troughs in demand using data from smart meters.

Repurposed data held by private companies could be made publicly available, shared with other companies, or processed securely by the public sector, depending on the extent to which sharing the data presents privacy risks or undermines competition. The report suggests that the government would not require companies to share data publicly when doing so would impact legitimate business interests, nor would it require that any personal data be made public. Instead, Dr. Villani argues that, if wider data sharing would do unreasonable damage to a company’s commercial interests, it may be appropriate to only give public authorities access to the data. But where the stakes are lower, companies could be required to share the data more widely, to maximize reuse. Villani rightly argues that it is virtually impossible to come up with generalizable rules for how data should be shared that would work across all sectors. Instead, he argues for a sector-specific approach to determining how and when data should be shared.

After making the case for state-mandated repurposing of data, the report goes on to highlight four key sectors as priorities: health, transport, the environment, and defense. Since these all have clear implications for the public interest, France can create national laws authorizing extensive repurposing of personal data without violating the General Data Protection Regulation (GDPR) which allows national laws that permit the repurposing of personal data where it serves the public interest. The French strategy is the first clear effort by an EU member state to proactively use this clause in aid of national efforts to bolster AI….(More)”.

A roadmap for restoring trust in Big Data


Mark Lawler et al in the Lancet: “The fallout from the Cambridge Analytica–Facebook scandal marks a significant inflection point in the public’s trust concerning Big Data. The health-science community must use this crisis-in-confidence to redouble its commitment to talk openly and transparently about benefits and risks and to act decisively to deliver robust effective governance frameworks, under which personal health data can be responsibly used. Activities such as the Innovative Medicines Initiative’s Big Data for Better Outcomes emphasise how a more granular data-driven understanding of human diseases including cancer could underpin innovative therapeutic intervention.
 Health Data Research UK is developing national research expertise and infrastructure to maximise the value of health data science for the National Health Service and ultimately British citizens.
Comprehensive data analytics are crucial to national programmes such as the US Cancer Moonshot, the UK’s 100 000 Genomes project, and other national genomics programmes. Cancer Core Europe, a research partnership between seven leading European oncology centres, has personal data sharing at its core. The Global Alliance for Genomics and Health recently highlighted the need for a global cancer knowledge network to drive evidence-based solutions for a disease that kills more than 8·7 million citizens annually worldwide. These activities risk being fatally undermined by the recent data-harvesting controversy.
We need to restore the public’s trust in data science and emphasise its positive contribution in addressing global health and societal challenges. An opportunity to affirm the value of data science in Europe was afforded by Digital Day 2018, which took place on April 10, 2018, in Brussels, and where European Health Ministers signed a declaration of support to link existing or future genomic databanks across the EU, through the Million European Genomes Alliance.
So how do we address evolving challenges in analysis, sharing, and storage of information, ensure transparency and confidentiality, and restore public trust? We must articulate a clear Social Contract, where citizens (as data donors) are at the heart of decision-making. We need to demonstrate integrity, honesty, and transparency as to what happens to data and what level of control people can, or cannot, expect. We must embed ethical rigour in all our data-driven processes. The Framework for Responsible Sharing of Genomic and Health Related Data represents a practical global approach, promoting effective and ethical sharing and use of research or patient data, while safeguarding individual privacy through secure and accountable data transfer…(More)”.

Americans Want to Share Their Medical Data. So Why Can’t They?


Eleni Manis at RealClearHealth: “Americans are willing to share personal data — even sensitive medical data — to advance the common good. A recent Stanford University study found that 93 percent of medical trial participants in the United States are willing to share their medical data with university scientists and 82 percent are willing to share with scientists at for-profit companies. In contrast, less than a third are concerned that their data might be stolen or used for marketing purposes.

However, the majority of regulations surrounding medical data focus on individuals’ ability to restrict the use of their medical data, with scant attention paid to supporting the ability to share personal data for the common good. Policymakers can begin to right this balance by establishing a national medical data donor registry that lets individuals contribute their medical data to support research after their deaths. Doing so would help medical researchers pursue cures and improve health care outcomes for all Americans.

Increased medical data sharing facilitates advances in medical science in three key ways. First, de-identified participant-level data can be used to understand the results of trials, enabling researchers to better explicate the relationship between treatments and outcomes. Second, researchers can use shared data to verify studies and identify cases of data fraud and research misconduct in the medical community. For example, one researcher recently discovered a prolific Japanese anesthesiologist had falsified data for almost two decades. Third, shared data can be combined and supplemented to support new studies and discoveries.

Despite these benefits, researchers, research funders, and regulators have struggled to establish a norm for sharing clinical research data. In some cases, regulatory obstacles are to blame. HIPAA — the federal law regulating medical data — blocks some sharing on grounds of patient privacy, while federal and state regulations governing data sharing are inconsistent. Researchers themselves have a proprietary interest in data they produce, while academic researchers seeking to maximize publications may guard data jealously.

Though funding bodies are aware of this tension, they are unable to resolve it on their own. The National Institutes of Health, for example, requires a data sharing plan for big-ticket funding but recognizes that proprietary interests may make sharing impossible….(More)”.

The economic value of data: discussion paper


HM Treasury (UK): “Technological change has radically increased both the volume of data in the economy, and our ability to process it. This change presents an opportunity to transform our economy and society for the better.

Data-driven innovation holds the keys to addressing some of the most significant challenges confronting modern Britain, whether that is tackling congestion and improving air quality in our cities, developing ground-breaking diagnosis systems to support our NHS, or making our businesses more productive.

The UK’s strengths in cutting-edge research and the intangible economy make it well-placed to be a world leader, and estimates suggest that data-driven technologies will contribute over £60 billion per year to the UK economy by 2020.1 Recent events have raised public questions and concerns about the way that data, and particularly personal data, can be collected, processed, and shared with third party organisations.

These are concerns that this government takes seriously. The Data Protection Act 2018 updates the UK’s world-leading data protection framework to make it fit for the future, giving individuals strong new rights over how their data is used. Alongside maintaining a secure, trusted data environment, the government has an important role to play in laying the foundations for a flourishing data-driven economy.

This means pursuing policies that improve the flow of data through our economy, and ensure that those companies who want to innovate have appropriate access to high-quality and well-maintained data.

This discussion paper describes the economic opportunity presented by data-driven innovation, and highlights some of the key challenges that government will need to address, such as: providing clarity around ownership and control of data; maintaining a strong, trusted data protection framework; making effective use of public sector data; driving interoperability and standards; and enabling safe, legal and appropriate data sharing.

Over the last few years, the government has taken significant steps to strengthen the UK’s position as a world leader in data-driven innovation, including by agreeing the Artificial Intelligence Sector Deal, establishing the Geospatial Commission, and making substantial investments in digital skills. The government will build on those strong foundations over the coming months, including by commissioning an Expert Panel on Competition in Digital Markets. This Expert Panel will support the government’s wider review of competition law by considering how competition policy can better enable innovation and support consumers in the digital economy.

There are still big questions to be answered. This document marks the beginning of a wider set of conversations that government will be holding over the coming year, as we develop a new National Data Strategy….(More)”.