Selected Readings on Indigenous Data Sovereignty


By Juliet McMurren

As part of an ongoing effort to build a knowledge base for the field of improving governance through data and technology, The GovLab publishes a series of Selected Readings, which provide an annotated and curated collection of recommended works on themes such as open data, data collaboration, and civic technology.

In this edition, to recognize and honor Indigenous Peoples’ Day, we have curated below a selection of literature on Indigenous data sovereignty (IDS), the principle that Indigenous peoples should be able to control the data collected by and about them, to determine how and by whom it is accessed, stored, and used, and to develop data practices and methodologies that reflect their lived experiences, cultures, and worldviews. The selection complements previously released readings on Personal DataData GovernanceAlgorithmic Scrutiny — among other Data related topics.

To suggest additional readings on this or any other topic, please email info@thelivinglib.org. All our Selected Readings can be found here.

Selected Readings (in alphabetical order)

Principles

Kukutai, Tahu and John Taylor (eds) Indigenous Data Sovereignty: Towards an Agenda (2016)

  • The foundational work in the field, this edited volume brings together Māori, Australian Aboriginal, Native American, and First Nations academics, researchers and data practitioners to set out the case for Indigenous data sovereignty.
  • Organized in four parts, the book begins by providing a historical account of colonialist statistics and the origins of the concept of Indigenous data sovereignty. In the second part, the authors set out an Indigenous critique of official statistics as a colonialist practice primarily intended to serve settler governments through the control of Indigenous peoples. As a result, population statistics from these societies are imbued with colonialist norms that both ignore indicators significant to Indigenous peoples and reduce them to what contributor Maggie Walter calls 5D data: disparity, deprivation, disadvantage, dysfunction, and difference.
  • The authors outline how Indigenous data sovereignty would work, setting out an agenda in which Indigenous people would control who should be counted among them, and establish collection priorities reflective of their cultural norms, interests, values and priorities. This could include a move away from data about individuals as single indicators used to compare, rank and drive “improvement” to a more nuanced and complex view of data that focuses on social groupings beyond the household. They would also control who would have access to the data gathered, with culturally appropriate rules and protocols for consents to access and use data. These principles are encapsulated in the First Nations OCAP® data model, through which they assert their ownership, control over collection, use and disclosure; access to, and possession of all First Nations’ data.
  • The third part of the book provides examples of Indigenous data sovereignty in practice, from the perspective of both data practitioners and data users. A case study of data sovereignty among the Yawuru of Western Australia outlines a methodology for developing data collection rooted in self-determination and community values of mabu buru (knowledge of the land) and mabu liyan (relational or community wellbeing). Another examines the work of a Māori primary health care organization, National Hauora Coalition, which conducted a rapid response campaign to reduce high rates of acute rheumatic fever among Māori children in Auckland. Stewardship and analysis of their own community data enabled targeted interventions that reduced positive Group A strep rates among children by 75 percent and rates of rheumatic fever by 33 percent.
  • The final section of the book outlines the emerging efforts of the New Zealand and Australian Government to engage with Indigenous peoples’ desire for data sovereignty through their statistical practices.

Lovett, Raymond et al Good Data Practices for Indigenous Data Sovereignty and Governance (2019)

  • This multi-authored chapter is the first in a volume the editors describe as born of frustration with dystopian “bad data” practices and devoted to the exploration of how data could be used “productively and justly to further social, economic, cultural and political goals.”
  • The chapter sets out the context for the emergence of IDS movements worldwide, and gives a survey of IDS networks and their foundational principles, such as OCAP® (above) and the Māori principles of rangatiratanga(right to own, access, control and possess), manaakitanga (ethical use to further wellbeing) and kaitiakitanga (sustainable stewardship) as they apply to data about or from themselves and their environs.
  • The article defines and differentiates IDS — the management of information in alignment with the laws, practices and customs of the nation-state in which it is located — and indigenous data governance (IDG), or power and authority over the design, ownership, access to and use of data. It situates IDS movements alongside broader movements for Indigenous sovereignty informed by the rights laid out in the UN Declaration on the Rights of Indigenous Peoples.

Rainie, Stephanie Carroll, Tahu Kukutai, Maggie Walter, Oscar Figueroa-Rodriguez, Jennifer Walker, and Per Axelsson (2019) Issues in Open Data — Indigenous Data Sovereignty. In T. Davies, S. Walker, M. Rubinstein, & F. Perini (Eds.), The State of Open Data: Histories and Horizons.

  • Part of a wider report about the state of open data, this chapter discusses the tension between the principles of open data and IDS. Describing open data for Indigenous Peoples as a double-edged sword, the authors note the potential of open data to help deliver on Indigenous aspirations for sustainable development. At the same time, open data perpetuates data challenges born of colonization, including assumptions about a single nation-state and that open data is both useful and benign.
  • The authors observe that there is a widespread lack of understanding about IDS within the open data movement, and that open data policy and discussions have been largely framed to address the needs and interests of nation-states, with minimal engagement with Indigenous peoples. The authors provide a critique of the ways in which the Open Data Charter overlooks the issue of IDS in its principles on open by default, citizen engagement, and inclusive development. They note that the Open Data Charter’s commitment to free use, reuse, and redistribution by anyone, at any time, and anywhere, for example, is in direct conflict with the rights of Indigenous Peoples to govern their own data and control how and by whom it is accessed.
  • Opening state data that is unreliable, inaccurate, and designed, collected and processed according to the norms of state agencies poses additional problems for Indigenous peoples. Statistics about Indigenous peoples based on colonialist norms frequently perpetuate a narrative of inequality. Data infrastructures may be distorted by cultural assumptions, such as those about naming conventions, that misrepresent Indigenous people. In addition, the concept of open data has led to instances of cooptation and theft of Indigenous knowledge, when researchers have collected Indigenous knowledge about the environment, digitized it and shared it with Indigenous consent or oversight.
  • Drawing on the experience of Indigenous data networks worldwide, the authors propose three steps forward for the open data community in its relationship to Indigenous peoples. First, it needs to engage with Indigenous peoples as partners and knowledge holders to inform stewardship of Indigenous data. Secondly, IDS networks, with contacts in Indigenous communities and the world of data, should act as intermediaries for this engagement. Finally, they call for a broader adoption of principles on the governance and stewardship of Indigenous data within research and administration.

Research Data Alliance CARE Principles for Indigenous Data Governance(2019)

  • The RDA’s CARE principles propose an additional set of criteria that should be applied to open data in order to ensure that it respects Indigenous rights to self-determination. It argues the existing FAIR principles — that open data should be findable, accessible, interoperable and reusable — focus on data characteristics that facilitate increased sharing while ignoring historical context and power differentials.
  • To supplement FAIR, they propose the addition of CARE: that open data should be for collective benefit, recognize Indigenous peoples’ authority to control their own data, carry a responsibility to demonstrate how they benefit self-determination, and have embedded ethics prioritizing the rights and wellbeing of Indigenous people.
  • The principle of collective benefit asserts that data ecosystems should be designed in ways that Indigenous peoples can derive benefit from them. This includes active government support for Indigenous data use and reuse, using data to reduce information asymmetries between government and Indigenous communities, and the use of any value created from Indigenous data to benefit Indigenous communities. Authority to control recognizes the rights and interests of Indigenous peoples in their knowledge and data, and to govern and control how it is collected, accessed, used and stored.
  • Those working with Indigenous data have a responsibility to demonstrate how they are using it to benefit Indigenous communities. This involves fostering relationships of partnership and trust, working to build capability and capacity within Indigenous communities, and grounding data in the experiences, languages and worldviews of those communities. Finally, the rights and wellbeing of Indigenous peoples must be the primary concern. This requires data design and collection practices that do not stigmatize Indigenous people and that align with Indigenous ethical practices, that address imbalances in power and resources, and that are mindful of the potential for future use and potential harms.

Applications and Case Studies

Carroll, Stephanie Russo, Desi Rodriguez-Lonebear and Andrew Martinez, Indigenous Data Governance: Strategies from United States Native Nations(2019)

  • This article reviews IDS strategies from Native nations in the United States, connecting IDS and IDG to the rebuilding of Native nations and providing case studies of IDG occurring within tribal and non-tribal entities.
  • The article leads with a definition of key terms, including data dependency, “a paradox of scarcity and abundance: extensive data are collected about Indigenous peoples and nations, but rarely by or for Indigenous peoples’ and nations’ purposes.” It proposes IDG as a method by which the aspiration of IDS can be achieved, through a self-reinforcing cycle: governance of data leads to data rebuilding, providing data for governance that in turn leads to nation rebuilding.
  • The article offers three tribal, two non-tribal, and three urban, inter- and supra-tribal case studies of IDG in practice. The National Congress of American Indians Tribal Data Capacity Project, for example, was a pilot project to build tribal data capacity with five US tribes. Its outputs included a successful census conducted by the Pueblo of Laguna and the University of New Mexico, on tribal terms with tribal money for tribal purposes, and resulting in the development of proprietary software that remains the property of the tribe and that can be reused for subsequent collections.
  • The article concludes with a set of recommendations for tribal rights holders and stakeholders. It recommends that tribal rights holders develop tribe-specific data governance principles, policies, and procedures, and generate resources for IDG. Stakeholders are called on to acknowledge, support and promote IDS and embed it in data collection practices by building frameworks specifying how IDS is to be enacted in data processes, investing in intertribal institutions and recruiting and training Indigenous data professionals, among other measures.

Chaney, Christopher Data Sovereignty and the Tribal Law and Order Act (2018)

  • This article surveys the relationship between data sovereignty and the provision of criminal justice services, a key aspect of tribal sovereignty. The Tribal Law and Order Act (TLOA) 2010 addressed tribal data by mandating federal justice and law enforcement agencies to coordinate and consult with tribes over data collection, and providing tribal criminal justice agencies meeting federal and state requirements with access to national crime databases to enter and retrieve data.
  • TLOA has resulted in broad and extensive opportunities for federally recognized tribes to submit and retrieve data. Subject to federal law, tribes have the right to determine what information they will submit and access, putting the tribe in control of its own data. It also greatly facilitates the administration of tribal law enforcement and justice by enabling access to federal databases on property, such as vehicles and firearms, and people, including those on fugitives, sex offenders, and missing persons. The author suggests that TLOA implementation could serve as a model for other federal agencies working towards tribal data sovereignty arrangements.

[Dewar, J.] First Nations’ Data Sovereignty in Canada (2019).

  • This paper provides an overview of First Nations experiences of Canadian efforts to identify First Nations individuals, communities, and Nations in official statistics and data, and of the development of First Nations Data Sovereignty efforts over the previous two decades.
  • The paper surveys the ways in which early legislation constructed “Indians” and indian status within Canada counter to First Nations norms, harming traditional gender roles, leadership structures and governance, severing many First Nations women who “married out” from the culture and lands, and forcing First Nations people to choose between “enfranchisement” through education or employment and their Indian status and culture.
  • The paper then surveys the current First Nations statistical context, noting its numerous deficiencies. Sources of information and data, including the national census, were created with little or no Indigenous involvement or input, creating inconsistencies in the accuracy, reliability, usefulness, and comparability of the data. Even where the data is useful, it is not routinely used in planning and advocacy for the benefit of First Nations communities. First Nations are also required to meet onerous reporting requirements in order to access federal funding, but the resulting data — and other data from and about First Nations — are not effectively analyzed, used, or shared with First Nations.
  • The paper provides examples of effective instances of national and regional First Nations data sovereignty using OCAP® principles. These include the First Nations Information Governance Centre’s own survey work on health, childhood, education, labor and employment, but also similar provincial initiatives. The FNIGC is currently at work with regional partners to develop a National Data Governance Strategy to advance First Nations Data Sovereignty.

Garrison, Nanibaa’ et al Genomic Research Through An Indigenous Lens: Understanding the Expectations (2019)

  • This multi-authored study compares research guidelines for genomic research among Indigenous peoples in Canada, New Zealand, Australia, and the United States.
  • It notes that while there is a dearth of genomic research about Indigenous peoples, Indigenous communities have been the subject of western science in ways that have been intrusive, disrespectful and unethical, leading to community harms and mistrust. Lack of community engagement and informed consent for secondary use of data, and past experiences of harmful and negative representation in publications, have reduced the willingness of Indigenous peoples to engage with genetic research.
  • Canada, New Zealand, Australia, and the United States each have guidelines on scientific research among Indigenous peoples. The authors compare the provisions of these guidelines and the Indigenous Research Protection Act, a draft instrument developed by the Indigenous Peoples Council on Biocolonialism with the goal of protecting Indigenous peoples in research, across four principles: community engagement, rights and interests, institutional responsibilities, and ethical/legal oversight. They observe that while many of the policies provide for protection of Indigenous peoples relating to sample collection, secondary uses of data, benefits, and withdrawal from research, there is less consistency regarding cultural rights and interests, particularly in US instruments.
  • The authors examine ways Indigenous peoples have sought to “bridge the gap” between the benefits of genomic research and the protection of Indigenous peoples. Community protocol development, Indigenous-led genomics initiatives, and consent procedures that draw on UNDRIP have increased community engagement in some countries and fostered greater trust. Concrete progress has also been made in initiatives to preserve Indigenous rights and interests over biospecimens, including protocols that allow for the return of samples, biobanking, and Indigenous governance of resulting data.

Gifford, Heather and Kirikowhai Mikaere Te Kete Tū Ātea: Towards claiming Rangitīkei iwi data sovereignty (2019)

  • This article gives an outline of the Te Kete Tū Ātea research project, an four-year, two phase participatory research initiative by the Rangitīkei Iwi Collective to establish iwi data sovereignty. The first phase resulted in the development of an iwi data needs analysis and comprehensive iwi information framework, which identified potential data sources, gaps in current information, and strategies to address those gaps. The second phase led to the prioritization of a key information gathering domain, economic data, and a statistical evaluation of current iwi data holdings. The project adopted a Kaupapa Māori approach: it was “Māori led, Māori controlled, privileged a Māori worldview, and was framed around questions identified by Māori as of relevance to Māori.”
  • The first phase of the study identified a five domain framework to guide iwi data gathering. Collectively, these five domains — cultural, social, peoples, environmental and economic — make up Te Kete Tū Ātea, informed by three goal dimensions: kaitiakitanga, strengthening identity and connection, and empowerment and enablement.
  • The study identified challenges in assessing the wellbeing of iwi, including statistical capacity within iwi and the availability of data, but the authors suggest that the approach itself could be borrowed and applied by other iwi nationwide.

Johnson-Jennings, Michelle, Derek Jennings, and Meg Little Indigenous data sovereignty in action: The Food Wisdom Repository (2019)

  • This article arose from the experience of the authors at the Research for Indigenous Community Health (RICH) Center. Observing that while Indigenous health and nutrition information is available, it is dispersed and difficult to access, they proposed the development of a Food Wisdom Repository to gather meaningful data and information on Indigenous health practices and efforts. The result, supported by the Shakopee Mdewakanton Sioux Community, is an online digital repository of wise food practices grounded in Indigenous knowledge and IDS.
  • The project draws on Indigenous worldviews, knowledge and ways of knowing, beliefs, and forms of power. In particular, it is framed around the idea of wise practices — pragmatic, flexible and sustainable practices rooted in a given local context and the wisdom of community members — rather than the objective, hierarchical, hegemonic and acontextual “best practices” of Western science.

Montenegro, Maria Subverting the universality of metadata standards: The TK labels as a tool to promote Indigenous data sovereignty (2019)

  • This paper explores how metadata standards, and in particular the widely used Dublin Core, reinforce colonial legal property frameworks and disenfranchise Indigenous people, and how they could be used (or subverted) to exercise and promote IDS.
  • The author notes that the rights and creator fields of DC are in direct conflict with Indigenous epistemologies and protocols on the access, circulation, and use of traditional Indigenous knowledge (TK). The rights field is embedded in western legal practice designed to recognize and protect new creations or inventions, and require a designated individual author and original work in order to offer any protection. This emphasis on originality and individuality is at odds with Indigenous knowledge that emphasizes collective and cumulative knowledge acquired over generations. Similarly, both western IP law and the creator field within DC recognize the individual who records the lifestyles, languages and cultural practices of Indigenous people in film, audio, or image as the legal author, rather than the communities from which the content arose. As subjects but not authors, Indigenous people have no control over these recordings of their cultural practices, or how they are stored, accessed or reused. Indeed, they are even legally required to seek permission from the author to reuse these materials that document their lives and culture.
  • Developed in collaboration with Indigenous peoples, the TK labels are a set of digital tags that can be included as associated metadata in various digital information contexts such as CMSs, online catalogs and databases, finding aids and online platforms. These tags are intended to increase awareness of culturally appropriate circulation, access and use of Indigenous cultural materials. Designed to be used where communities are unable to assert legal control over materials, they provide important information about culturally appropriate use and stewardship. A Seasonal tag developed by the Penobscot Nation, for example, proscribes access to some content outside a given time of year, while an Attribution label, the most widely used, allows Indigenous communities to assert that they are the TK holders of the content and should be acknowledged as such.
  • While the TK labels represent a welcome advance in capturing and asserting Indigenous metadata standards, they are voluntary, and therefore only function if non-tribal collecting institutions recognize the IDS of the tribes.

McMahon, Rob, Tim LaHache, and Tim Whiteduck Digital Data Management as Indigenous Resurgence in Kahnawà:ke (2015)

  • This article documents IDG experiences within the Kahnawà:ke Mohawk (Quebec) community as it set up and used ICT systems to manage community data on research, education, finance, health, membership, housing, lands, and resources. Their research followed the implementation of a customized digital data management system, and sought to find out employees of community service organizations, chiefly in education, conceived of and used data, and the role of data management as part of self-government and Indigenous resurgence.
  • The authors describe the initiative as an act of “everyday community resurgence,” but one that was accompanied by significant internal tensions and challenges. They note the need to avoid technological determinism in IDS, since the use of ICTs has the potential to exacerbate the effects of settler colonialism, concentrating and centralizing power.
  • The article describes the rollout, architecture, and governance of the Kahnawà:ke data management system. One of the challenges faced by the community was in data sharing, with a lack of trust between community organizations leading to data hugging and silos. This tension, which has been identified in other research cited by the authors, points to the need for trust-building in order to promote more holistic data sharing and optimal data use.

Rainie, Stephanie Carroll et al Data as a Strategic Resource: Self-determination, Governance, and the Data Challenge for Indigenous Nations in the United States (2017)

  • Despite the need of Indigenous nations for data to help identify problems and find solutions, US Indigenous nations encounter a data landscape characterized by “sparse, inconsistent, and irrelevant information complicated by limited access and utility” that does not serve to address tribally defined needs. Because much of this data is collected and controlled by others for their own purposes, mistrust in data collection is high.
  • This article documents two cases studies in tribal data sovereignty and data governance, among the Ysleta del Sur Pueblo and Cheyenne River Sioux Tribe. It lays out the data priorities, agendas and challenges faced by each, and the resulting data initiatives, protocols and uses. The article also discusses how this data governance contributed to the tribes’ self determination.
  • As part of a development strategy, in 2008 the Ysleta del Sur began to collect socioeconomic and demographic data annually from its citizens as part of its enrolment process. Implementing a census approach that incorporated cultural and local knowledge and western epistemologies, the project yielded data about population, poverty rates, household incomes, educational attainment, workforce and unemployment that was more complete than US census data. Strong community engagement yielded a 90 percent response rate, and the results inspired other data initiatives to support community strategic decision making. The socioeconomic data was also used to support successful applications for federal funding.
  • Identifying high levels of poverty and unemployment as a problem, the Cheyenne River Sioux Tribe sought a comprehensive plan to address these problems, for which they needed timelier, more granular, and more culturally and locally relevant data than that available through the federal government. With academic partners, it developed a survey and data collection process to collect baseline demographic and socioeconomic data from a sample of residents. The survey was able to quantify unemployment rates among people living on the reservation, but also captured employment categories missed by federal data collection, such as the arts microenterprise sector. The results were shared back to the community, and used to foster microenterprises and write grant applications.

Walter, Maggie and Michelle Suina Indigenous data, indigenous methodologies and indigenous data sovereignty (2019)

  • In this article, Walter and Suina propose that there is a dearth of Indigenous quantitative methodologies, driven by a longstanding mistrust of positivist research that positions Indigenous peoples within a deficit discourse. What the authors call “quantitative avoidance” leads to lived consequences for Indigenous peoples: since the statistics produced by quantitative methods form the primary evidence base for policy within the colonial societies, failing to engage with them removes Indigenous people from a critical part of the policy debate.
  • The authors make a case for developing Indigenous quantitative methodologies. They evidence the Albuquerque Area Southwest Tribal Epidemiology Center, whose mission is to collaborate with the 27 tribes of their health area to provide high quality, culturally congruent epidemiology, capacity development, program evaluation, and health promotion. Committed to honoring tribal sovereignty, AASTEC is committed to building capacity that enables tribes to control data design, collection, and management at all stages of the process. This requires not merely adapting western survey instruments, but redesigning them to incorporate the values and definitions of health of the communities they serve.
  • The authors close with three recommendations for communities and stakeholders interested in building Indigenous quantitative methodologies. First, communities need to cultivate technical skills for survey development, data collection, analysis and reporting. Secondly, they need to build comfort and understand about research methods among tribal partners in order to undo decades of mistrust; the authors describe simulation exercises that help to demonstrate how worldviews shape expectations and perceptions around data that they have used successfully with Indigenous and non-Indigenous participants. Finally, they should pursue advocacy of IDS and an exchange of ideas that allows successful Indigenous research methodologies to be promulgated.

Selected Readings on Data Portability


By Juliet McMurren, Andrew Young, and Stefaan G. Verhulst

As part of an ongoing effort to build a knowledge base for the field of improving governance through technology, The GovLab publishes a series of Selected Readings, which provide an annotated and curated collection of recommended works on themes such as open data, data collaboration, and civic technology.

In this edition, we explore selected literature on data portability.

To suggest additional readings on this or any other topic, please email info@thelivinglib.org. All our Selected Readings can be found here.

Context

Data today exists largely in silos, generating problems and inefficiencies for the individual, business and society at large. These include:

  • difficulty switching (data) between competitive service providers;
  • delays in sharing data for important societal research initiatives;
  • barriers for data innovators to reuse data that could generate insights to inform individuals’ decision making; and
  • inhibitions to scale data donation.

Data portability — the principle that individuals have a right to obtain, copy, and reuse their own personal data and to transfer it from one IT platform or service to another for their own purposes — is positioned as a solution to these problems. When fully implemented, it would make data liquid, giving individuals the ability to access their own data in a usable and transferable format, transfer it from one service provider to another, or donate data for research and enhanced data analysis by those working in the public interest.

Some companies, including Google, Apple, Twitter and Facebook, have sought to advance data portability through initiatives like the Data Transfer Project, an open source software project designed to facilitate data transmittals. Newly enacted data protection legislation such as Europe’s General Data Protection Regulation (2018) and the California Consumer Privacy Act (2018) give data holders a right to data portability. However, despite the legal and technical advances made, many questions toward scaling up data liquidity and portability responsibly and systematically remain. These new data rights have generated complex and as yet unanswered questions about the limits of data ownership, the implications for privacy, security and intellectual property rights, and the practicalities of how, when, and to whom data can be transferred.

In this edition of the GovLab’s Selected Readings series, we examine the emerging literature on data portability to provide a foundation for future work on the value proposition of data portability. Readings are listed in alphabetical order.

Selected readings

Cho, Daegon, Pedro Ferreira, and Rahul Telang, The Impact of Mobile Number Portability on Price and Consumer Welfare (2016)

  • In this paper, the authors analyze how Mobile Number Portability (MNP) — the ability for consumers to maintain their phone number when changing providers, thus reducing switching costs — affected the relationship between switching costs, market price and consumer surplus after it was introduced in most European countries in the early 2000s.
  • Theory holds that when switching costs are high, market leaders will enjoy a substantial advantage and are able to keep prices high. Policy makers will therefore attempt to decrease switching costs to intensify competition and reduce prices to consumers.
  • The study reviewed quarterly data from 47 wireless service providers in 15 EU countries between 1999 and 2006. The data showed that MNP simultaneously decreased market price by over four percent and increased consumer welfare by an average of at least €2.15 per person per quarter. This increase amounted to a total of €880 million per quarter across the 15 EU countries analyzed in this paper and accounted for 15 percent of the increase in consumer surplus observed over this time.

CtrlShift, Data Mobility: The data portability growth opportunity for the UK economy (2018)

  • Commissioned by the UK Department of Digital, Culture, Media and Sport (DCMS), this study was intended to identify the potential of personal data portability for the UK economy.
  • Its scope went beyond the legal right to data portability envisaged by the GDPR, to encompass the current state of personal data portability and mobility, requirements for safe and secure data sharing, and the potential economic benefits through stimulation of innovation, productivity and competition.
  • The report concludes that increased personal data mobility has the potential to be a vital stimulus for the development of the digital economy, driving growth by empowering individuals to make use of their own data and consent to others using it to create new data-driven services and technologies.
  • However, the report concludes that there are significant challenges to be overcome, and new risks to be addressed, before the value of personal data can be realized. Much personal data remains locked in organizational silos, and systemic issues related to data security and governance and the uneven sharing of benefits need to be resolved.

Data Guidance and Future of Privacy Forum, Comparing Privacy Laws: GDPR v. CCPA (2018)

  • This paper compares the provisions of the GDPR with those of the California Consumer Privacy Act (2018).
  • Both article 20 of the GDPR and section 1798 of the CCPA recognize a right to data portability. Both also confer on data subjects the right to receive data from controllers free of charge upon request, and oblige controllers to create mechanisms to provide subjects with their data in portable and reusable form so that it can be transmitted to third parties for reuse.
  • In the CCPA, the right to data portability is an extension of the right to access, and only confers on data subjects the right to apply for data collected within the past 12 months and have it delivered to them. The GDPR does not impose a time limit, and allows data to be transferred from one data controller to another, but limits the right to automatically collected personal data provided by the data subject themselves through consent or contract.

Data Transfer Project, Data Transfer Project Overview and Fundamentals (2018)

  • The paper presents an overview of the goals, principles, architecture, and system components of the Data Transfer Project. The intent of the DTP is to increase the number of services offering data portability and provide users with the ability to transfer data directly in and out of participating providers through systems that are easy and intuitive to use, private and secure, reciprocal between services, and focused on user data. The project, which is supported by Microsoft, Google, Twitter and Facebook, is an open-source initiative that encourages the participation of other providers to reduce the infrastructure burden on providers and users.
  • In addition to benefits to innovation, competition, and user choice, the authors point to benefits to security, through allowing users to backup, organize, or archive their data, recover from account hijacking, and retrieve their data from deprecated services.
  • The DTP’s remit was to test concepts and feasibility for the transfer of specific types of user data between online services using a system of adapters to transfer proprietary formats into canonical formats that can be used to transfer data while allowing providers to maintain control over the security of their service. While not resolving all formatting or support issues, this approach would allow substantial data portability and encourage ecosystem sustainability.

Deloitte, How to Flourish in an Uncertain Future: Open Banking(2017)

  • This report addresses the innovative and disruptive potential of open banking, in which data is shared between members of the banking ecosystem at the authorization of the customer, with the potential to increase competition and facilitate new products and services. In the resulting marketplace model, customers could use a single banking interface to access products from multiple players, from established banks to newcomers and fintechs.
  • The report’s authors identify significant threats to current banking models. Banks that failed to embrace open banking could be relegated to a secondary role as an infrastructure provider, while third parties — tech companies, fintech, and price comparison websites — take over the customer relationship.
  • The report identifies four overlapping operating models banks could adopt within an open banking model: full service providers, delivering proprietary products through their own interface with little or no third-party integration; utilities, which provide other players with infrastructure without customer-facing services; suppliers, which offer proprietary products through third-party interfaces; and interfaces,which provide distribution services through a marketplace interface. To retain market share, incumbents are likely to need to adopt a combination of these roles, offering their own products and services and those of third parties through their own and others’ interfaces.

Digital Competition Expert Panel Unlocking Digital Competition(2019)

  • This report captures the findings of the UK Digital Competition Expert Panel, which was tasked in 2018 with considering opportunities and challenges the digital economy might pose for competition and competition policy and to recommend any necessary changes. The panel focused on the impact of big players within the sector, appropriate responses to mergers or anticompetitive practices, and the impact on consumers.
  • The panel found that the digital economy is creating many benefits, but that digital markets are subject to tipping, in which emerging winners can scoop much of the market. This concentration can give rise to substantial costs, especially to consumers, and cannot be solved by competition alone. However, government policy and regulatory solutions have limitations, including the slowness of policy change, uneven enforcement and profound informational asymmetries between companies and government.
  • The panel proposed the creation of a digital markets unit that would be tasked with developing a code of competitive conduct, enabling greater personal data mobility and systems designed with open standards, and advancing access to non-personal data to reduce barriers to market entry.
  • The panel’s model of data mobility goes beyond data portability, which involves consumers being able to request and transfer their own data from one provider to another. Instead, the panel recommended empowering consumers to instigate transfers of data between a business and a third party in order to access price information, compare goods and services, or access tailored advice and recommendations. They point to open banking as an example of how this could function in practice.
  • It also proposed updating merger policy to make it more forward-looking to better protect consumers and innovation and preserve the competitiveness of the market. It recommended the creation of antitrust policy that would enable the implementation of interim measures to limit damage to competition while antitrust cases are in process.

Egan, Erin, Charting a Way Forward: Data Portability and Privacy(2019)

  • This white paper by Facebook’s VP and Chief Privacy Officer, Policy, represents an attempt to advance the conversation about the relationship between data portability, privacy, and data protection. The author sets out five key questions about data portability: what is it, whose and what data should be portable, how privacy should be protected in the context of portability, and where responsibility for data misuse or improper protection should lie.
  • The paper finds that definitions of data portability still remain imprecise, particularly with regard to the distinction between data portability and data transfer. In the interest of feasibility and a reasonable operational burden on providers, it proposes time limits on providers’ obligations to make observed data portable.
  • The paper concludes that there are strong arguments both for and against allowing users to port their social graph — the map of connections between that user and other users of the service — but that the key determinant should be a capacity to ensure the privacy of all users involved. Best-practice data portability protocols that would resolve current differences of approach as to what, how and by whom information should be made available would help promote broader portability, as would resolution of liability for misuse or data exposure.

Engels, Barbara, Data portability among online platforms (2016)

  • The article examines the effects on competition and innovation of data portability among online platforms such as search engines, online marketplaces, and social media, and how relations between users, data, and platform services change in an environment of data portability.
  • The paper finds that the benefits to competition and innovation of portability are greatest in two kinds of environments: first, where platforms offer complementary products and can realize synergistic benefits by sharing data; and secondly, where platforms offer substitute or rival products but the risk of anti-competitive behaviour is high, as for search engines.
  • It identifies privacy and security issues raised by data portability. Portability could, for example, allow an identity fraudster to misuse personal data across multiple platforms, compounding the harm they cause.
  • It also suggests that standards for data interoperability could act to reduce innovation in data technology, encouraging data controllers to continue to use outdated technologies in order to comply with inflexible, government-mandated standards.

Graef, Inge, Martin Husovec and Nadezhda Purtova, Data Portability and Data Control: Lessons for an Emerging Concept in EU Law (2018)

  • This paper situates the data portability right conferred by the GDPR within rights-based data protection law. The authors argue that the right to data portability should be seen as a new regulatory tool aimed at stimulating competition and innovation in data-driven markets.
  • The authors note the potential for conflicts between the right to data portability and the intellectual property rights of data holders, suggesting that the framers underestimated the potential impact of such conflicts on copyright, trade secrets and sui generis database law.
  • Given that the right to data portability is being replicated within consumer protection law and the regulation of non-personal data, the authors argue framers of these laws should consider the potential for conflict and the impact of such conflict on incentives to innovate.

Mohsen, Mona Omar and Hassan A. Aziz The Blue Button Project: Engaging Patients in Healthcare by a Click of a Button (2015)

  • This paper provides a literature review on the Blue Button initiative, an early data portability project which allows Americans to access, view or download their health records in a variety of formats.
  • Originally launched through the Department of Veterans’ Affairs in 2010, the Blue Button initiative had expanded to more than 500 organizations by 2014, when the Department of Health and Human Services launched the Blue Button Connector to facilitate both patient access and development of new tools.
  • The Blue Button has enabled the development of tools such as the Harvard-developed Growth-Tastic app, which allows parents to check their child’s growth by submitting their downloaded pediatric health data. Pharmacies across the US have also adopted the Blue Button to provide patients with access to their prescription history.

More than Data and Mission: Smart, Got Data? The Value of Energy Data to Customers (2016)

  • This report outlines the public value of the Green Button, a data protocol that provides customers with private and secure access to their energy use data collected by smart meters.
  • The authors outline how the use of the Green Button can help states meet their energy and climate goals by enabling them to structure renewables and other distributed energy resources (DER) such as energy efficiency, demand response, and solar photovoltaics. Access to granular, near real time data can encourage innovation among DER providers, facilitating the development of applications like “virtual energy audits” that identify efficiency opportunities, allowing customers to reduce costs through time-of-use pricing, and enabling the optimization of photovoltaic systems to meet peak demand.
  • Energy efficiency receives the greatest boost from initiatives like the Green Button, with studies showing energy savings of up to 18 percent when customers have access to their meter data. In addition to improving energy conservation, access to meter data could improve the efficiency of appliances by allowing devices to trigger sleep modes in response to data on usage or price. However, at the time of writing, problems with data portability and interoperability were preventing these benefits from being realized, at a cost of tens of millions of dollars.
  • The authors recommend that commissions require utilities to make usage data available to customers or authorized third parties in standardized formats as part of basic utility service, and tariff data to developers for use in smart appliances.

MyData, Understanding Data Operators (2020)

  • MyData is a global movement of data users, activists and developers with a common goal to empower individuals with their personal data to enable them and their communities to develop knowledge, make informed decisions and interact more consciously and efficiently.
  • This introductory paper presents the state of knowledge about data operators, trusted data intermediaries that provide infrastructure for human-centric personal data management and governance, including data sharing and transfer. The operator model allows data controllers to outsource issues of legal compliance with data portability requirements, while offering individual users a transparent and intuitive way to manage the data transfer process.
  • The paper examines use cases from 48 “proto-operators” from 15 countries who fulfil some of the functions of an operator, albeit at an early level of maturity. The paper finds that operators offer management of identity authentication, data transaction permissions, connections between services, value exchange, data model management, personal data transfer and storage, governance support, and logging and accountability. At the heart of these functions is the need for minimum standards of data interoperability.
  • The paper reviews governance frameworks from the general (legislative) to the specific (operators), and explores proto-operator business models. In keeping with an emerging field, business models are currently unclear and potentially unsustainable, and one of a number of areas, including interoperability requirements and governance frameworks, that must still be developed.

National Science and Technology Council Smart Disclosure and Consumer Decision Making: Report of the Task Force on Smart Disclosure (2013)

  • This report summarizes the work and findings of the 2011–2013 Task Force on Smart Disclosure: Information and Efficiency, an interagency body tasked with advancing smart disclosure, through which data is made more available and accessible to both consumers and innovators.
  • The Task Force recognized the capacity of smart disclosure to inform consumer choices, empower them through access to useful personal data, enable the creation of new tools, products and services, and promote efficiency and growth. It reviewed federal efforts to promote smart disclosure within sectors and in data types that crosscut sectors, such as location data, consumer feedback, enforcement and compliance data and unique identifiers. It also surveyed specific public-private partnerships on access to data, such as the Blue and Green Button and MyData initiatives in health, energy and education respectively.
  • The Task Force reviewed steps taken by the Federal Government to implement smart disclosure, including adoption of machine readable formats and standards for metadata, use of APIs, and making data available in an unstructured format rather than not releasing it at all. It also reviewed “choice engines” making use of the data to provide services to consumers across a range of sectors.
  • The Task Force recommended that smart disclosure should be a core component of efforts to institutionalize and operationalize open data practices, with agencies proactively identifying, tagging, and planning the release of candidate data. It also recommended that this be supported by a government-wide community of practice.

Nicholas, Gabriel Taking It With You: Platform Barriers to Entry and the Limits of Data Portability (2020)

  • This paper considers whether, as is often claimed, data portability offers a genuine solution to the lack of competition within the tech sector.
  • It concludes that current regulatory approaches to data portability, which focus on reducing switching costs through technical solutions such as one-off exports and API interoperability, are not sufficient to generate increased competition. This is because they fail to address other barriers to entry, including network effects, unique data access, and economies of scale.
  • The author proposes an alternative approach, which he terms collective portability, which would allow groups of users to coordinate the transfer of their data to a new platform. This model raises questions about how such collectives would make decisions regarding portability, but would enable new entrants to successfully target specific user groups and scale rapidly without having to reach users one by one.

OECD, Enhancing Access to and Sharing of Data: Reconciling Risks and Benefits for Data Re-use across Societies (2019)

  • This background paper to a 2017 expert workshop on risks and benefits of data reuse considers data portability as one strategy within a data openness continuum that also includes open data, market-based B2B contractual agreements, and restricted data-sharing agreements within research and data for social good applications.
  • It considers four rationales offered for data portability. These include empowering individuals towards the “informational self-determination” aspired to by GDPR, increased competition within digital and other markets through reductions in information asymmetries between individuals and providers, switching costs, and barriers to market entry; and facilitating increased data flows.
  • The report highlights the need for both syntactic and semantic interoperability standards to ensure data can be reused across systems, both of which may be fostered by increased rights to data portability. Data intermediaries have an important role to play in the development of these standards, through initiatives like the Data Transfer Project, a collaboration which brought together Facebook, Google, Microsoft, and Twitter to create an open-source data portability platform.

Personal Data Protection Commission Singapore Response to Feedback on the Public Consultation on Proposed Data Portability and Data Innovation Provisions (2020)

  • The report summarizes the findings of the 2019 PDPC public consultation on proposals to introduce provisions on data portability and data innovation in Singapore’s Personal Data Protection Act.
  • The proposed provision would oblige organizations to transmit an individual’s data to another organization in a commonly used machine-readable format, upon the individual’s request. The obligation does not extend to data intermediaries or organizations that do not have a presence in Singapore, although data holders may choose to honor those requests.
  • The obligation would apply to electronic data that is either provided by the individual or generated by the individual’s activities in using the organization’s service or product, but not derived data created by the processing of other data by the data holder. Respondents were concerned that including derived data could harm organizations’ competitiveness.
  • Respondents were concerned about how to honour data portability requests where the data of third parties was involved, as in the case of a joint account holder, for example. The PDPC opted for a “balanced, reasonable, and pragmatic approach,” allowing data involving third parties to be ported where it was under the requesting individual’s control, was to be used for domestic and personal purposes, and related only to the organization’s product or service.

Quinn, Paul Is the GDPR and Its Right to Data Portability a Major Enabler of Citizen Science? (2018)

  • This article explores the potential of data portability to advance citizen science by enabling participants to port their personal data from one research project to another. Citizen science — the collection and contribution of large amounts of data by private individuals for scientific research — has grown rapidly in response to the development of new digital means to capture, store, organize, analyze and share data.
  • The GDPR right to data portability aids citizen science by requiring transfer of data in machine-readable format and allowing data subjects to request its transfer to another data controller. This requirement of interoperability does not amount to compatibility, however, and data thus transferred would probably still require cleaning to be usable, acting as a disincentive to reuse.
  • The GDPR’s limitation of transferability to personal data provided by the data subject excludes some forms of data that might possess significant scientific potential, such as secondary personal data derived from further processing or analysis.
  • The GDPR right to data portability also potentially limits citizen science by restricting the grounds for processing data to which the right applies to data obtained through a subject’s express consent or through the performance of a contract. This limitation excludes other forms of data processing described in the GDPR, such as data processing for preventive or occupational medicine, scientific research, or archiving for reasons of public or scientific interest. It is also not clear whether the GDPR compels data controllers to transfer data outside the European Union.

Wong, Janis and Tristan Henderson, How Portable is Portable? Exercising the GDPR’s Right to Data Portability (2018)

  • This paper presents the results of 230 real-world requests for data portability in order to assess how — and how well — the GDPR right to data portability is being implemented. The authors were interested in establishing the kinds of file formats that were returned in response to requests, and to identify practical difficulties encountered in making and interpreting requests, over a three month period beginning on the day the GDPR came into effect.
  • The findings revealed continuing problems around ensuring portability for both data controllers and data subjects. Of the 230 requests, only 163 were successfully completed.
  • Data controllers frequently had difficulty understanding the requirements of GDPR, providing data in incomplete or inappropriate formats: only 40 percent of the files supplied were in a fully compliant format. Additionally, some data controllers were confused between the right to data portability and other rights conferred by the GDPR, such as the right to access or erasure.

Selected Readings on AI for Development


By Dominik Baumann, Jeremy Pesner, Alexandra Shaw, Stefaan Verhulst, Michelle Winowatan, Andrew Young, Andrew J. Zahuranec

As part of an ongoing effort to build a knowledge base for the field of improving governance through technology, The GovLab publishes a series of Selected Readings, which provide an annotated and curated collection of recommended works on themes such as open data, data collaboration, and civic technology. 

In this edition, we explore selected literature on AI and Development. This piece was developed in the context of The GovLab’s collaboration with Agence Française de Développement (AFD) on the use of emerging technology for development. To suggest additional readings on this or any other topic, please email info@thelivinglib.org. All our Selected Readings can be found here.

Context: In recent years, public discourse on artificial intelligence (AI) has focused on its potential for improving the way businesses, governments, and societies make (automated) decisions. Simultaneously, several AI initiatives have raised concerns about human rights, including the possibility of discrimination and privacy breaches. Between these two opposing perspectives is a discussion on how stakeholders can maximize the benefits of AI for society while minimizing the risks that might arise from the use of this technology.

While the majority of AI initiatives today come from the private sector, international development actors increasingly experiment with AI-enabled programs. These initiatives focus on, for example, climate modelling, urban mobility, and disease transmission. These early efforts demonstrate the promise of AI for supporting more efficient, targeted, and impactful development efforts. Yet, the intersection of AI and development remains nascent, and questions remain regarding how this emerging technology can deliver on its promise while mitigating risks to intended beneficiaries.

Readings are listed in alphabetical order.

2030Vision. AI and the Sustainable Development Goals: the State of Play

  • In broad language, this document for 2030Vision assesses AI research and initiatives and the Sustainable Development Goals (SDGs) to determine gaps and potential that can be further explored or scaled. 
  • It specifically reviews the current applications of AI in two SDG sectors, food/agriculture and healthcare.
  • The paper recommends enhancing multi-sector collaboration among businesses, governments, civil society, academia and others to ensure technology can best address the world’s most pressing challenges.

Andersen, Lindsey. Artificial Intelligence in International Development: Avoiding Ethical Pitfalls. Journal of Public & International Affairs (2019). 

  • Investigating the ethical implications of AI in the international development sector, the author argues that the involvement of many different stakeholders and AI-technology providers results in ethical issues concerning fairness and inclusion, transparency, explainability and accountability, data limitations, and privacy and security.
  • The author recommends the information communication technology for development (ICT4D) community adopt the Principles for Digital Development to ensure the ethical implementation of AI in international development projects.
  • The Principles of Digital Development include: 1) design with the user; 2) understand the ecosystem; 3) design for scale; 4) build for sustainability; 5) be data driven; 6) use open standards, open data, open source, and open innovation; and 7) reuse and improve.

Arun, Chinmayi. AI and the Global South: Designing for Other Worlds in Markus D. Dubber, Frank Pasquale, and Sunit Das (eds.), The Oxford Handbook of Ethics of AI, Oxford University Press, Forthcoming (2019).

  • This chapter interrogates the impact of AI’s application in the Global South and raises concerns about such initiatives.
  • Arun argues AI’s deployment in the Global South may result in discrimination, bias, oppression, exclusion, and bad design. She further argues it can be especially harmful to vulnerable communities in places that do not have strong respect for human rights.
  • The paper concludes by outlining the international human rights laws that can mitigate these risks. It stresses the importance of a human rights-centric, inclusive, empowering context-driven approach in the use of AI in the Global South.

Best, Michael. Artificial Intelligence (AI) for Development Series: Module on AI, Ethics and Society. International Telecommunications Union (2018). 

  • This working paper is intended to help ICT policymakers or regulators consider the ethical challenges that emerge within AI applications.
  • The author identifies a four-pronged framework of analysis (risks, rewards, connections, and key questions to consider) that can guide policymaking in the fields of: 1) livelihood and work; 2) diversity, non-discrimination and freedoms from bias; 3) data privacy and minimization; and 4) peace and security.
  • The paper also includes a table of policies and initiatives undertaken by national governments and tech companies around AI, along with the set of values (mentioned above) explicitly considered.

International Development Innovation Alliance (2019). Artificial Intelligence and International Development: An Introduction

  • Results for Development, a nonprofit organization working in the international development sector, developed a report in collaboration with the AI and Development Working Group within the International Development Innovation Alliance (IDIA). The report provides a brief overview of AI and how this technology may impact the international development sector.
  • The report provides examples of AI-powered applications and initiatives that support the SDGs, including eradicating hunger, promoting gender equality, and encouraging climate action.
  • It also provides a collection of supporting resources and case studies for development practitioners interested in using AI.

Paul, Amy, Craig Jolley, and Aubra Anthony. Reflecting the Past, Shaping the Future: Making AI Work for International Development. United States Agency for International Development (2018). 

  • This report outlines the potential of machine learning (ML) and artificial intelligence in supporting development strategy. It also details some of the common risks that can arise from the use of these technologies.
  • The document contains examples of ML and AI applications to support the development sector and recommends good practices in handling such technologies. 
  • It concludes by recommending broad, shared governance, using fair and balanced data, and ensuring local population and development practitioners remain involved in it.

Pincet, Arnaud, Shu Okabe, and Martin Pawelczyk. Linking Aid to the Sustainable Development Goals – a machine learning approach. OECD Development Co-operation Working Papers (2019). 

  • The authors apply ML and semantic analysis to data sourced from the OECD’s Creditor Reporting System to map aid funding to particular SDGs.
  • The researchers find “Good Health and Well-Being” as the most targeted SDG, what the researchers call the “SDG darling.”
  • The authors find that mapping relationships between the system and SDGs can help to ensure equitable funding across different goals.

Quinn, John, Vanessa Frias-Martinez, and Lakshminarayan Subramanian. Computational Sustainability and Artificial Intelligence in the Developing World. Association for the Advancement of Artificial Intelligence (2014). 

  • These researchers suggest three different areas—health, food security, and transportation—in which AI applications can uniquely benefit the developing world. The researchers argue the lack of technological infrastructure in these regions make AI especially useful and valuable, as it can efficiently analyze data and provide solutions.
  • It provides some examples of application within the three themes, including disease surveillance, identification of drought and agricultural trends, modeling of commuting patterns, and traffic congestion monitoring.

Smith, Matthew and Sujaya Neupane. Artificial intelligence and human development: toward a research agenda (2018).

  • The authors highlight potential beneficial applications for AI in a development context, including healthcare, agriculture, governance, education, and economic productivity.
  • They also discuss the risks and downsides of AI, which include the “black boxing” of algorithms, bias in decision making, potential for extreme surveillance, undermining democracy, potential for job and tax revenue loss, vulnerability to cybercrime, and unequal wealth gains towards the already-rich.
  • They recommend further research projects on these topics that are interdisciplinary, locally conducted, and designed to support practice and policy.

Tomašev, Nenad, et al. AI for social good: unlocking the opportunity for positive impact. Nature Communications (2020).

  • This paper takes stock of what the authors term the AI for Social Good movement (AI4SG), which “aims to establish interdisciplinary partnerships centred around AI applications towards SDGs.”  
  • Developed at a multidisciplinary expert seminar on the topic, the authors present 10 recommendations for creating successful AI4SG collaborations: “1) Expectations of what is possible with AI need to be well grounded. 2) There is value in simple solutions. 3) Applications of AI need to be inclusive and accessible, and reviewed at every stage for ethics and human rights compliance. 4) Goals and use cases should be clear and well-defined. 5) Deep, long-term partnerships are required to solve large problem successfully. 6) Planning needs to align incentives, and factor in the limitations of both communities. 7) Establishing and maintaining trust is key to overcoming organisational barriers. 8) Options for reducing the development cost of AI solutions should be explored. 9) Improving data readiness is key. 10) Data must be processed securely, with utmost respect for human rights and privacy.”

Vinuesa, Ricardo, et al. The role of artificial intelligence in achieving the Sustainable Development Goals. 

  • This report analyzes how AI can meet both the demands of some SDGs and also inhibit progress toward others. It highlights a critical research gap about the extent to which AI impacts sustainable development in the medium and long term. 
  • Through his analysis, Vinuesa claims AI has the potential to positively impact the environment, society, and the economy. However, AI can hinder these groups.
  • The authors recognize that although AI enables efficiency and productivity, it can also increase inequality and hinder achievements of the 2030 Agenda. Vinuesa and his co-authors suggest adequate policy formation and regulation are needed to ensure fast and equitable development of AI technologies that can address the SDGs. 

United Nations Education, Scientific and Cultural Organization (UNESCO) (2019). Artificial intelligence for Sustainable Development: Synthesis Report, Mobile Learning Week 2019

  • In this report, UNESCO assesses the findings from Mobile Learning Week (MLW) 2019. The three main conclusions were: 1) the world is facing a learning crisis; 2) education drives sustainable development; and 3) sustainable development can only be achieved if we harness the potential of AI. 
  • Questions around four major themes dominated the MLW 2019 sessions: 1) how to guarantee inclusive and equitable use of AI in education; 2) how to harness AI to improve learning; 3) how to increase skills development; and 4) how to ensure transparent and auditable use of education data. 
  • To move forward, UNESCO advocates for more international cooperation and stakeholder involvement, creation of education and AI standards, and development of national policies to address educational gaps and risks. 

The Data Storytelling Workbook


Book by Anna Feigenbaum and Aria Alamalhodaei: “From tracking down information to symbolising human experiences, this book is your guide to telling more effective, empathetic and evidence-based data stories.

Drawing on cross-disciplinary research and first-hand accounts of projects ranging from public health to housing justice, The Data Storytelling Workbook introduces key concepts, challenges and problem-solving strategies in the emerging field of data storytelling. Filled with practical exercises and activities, the workbook offers interactive training materials that can be used for teaching and professional development. By approaching both ‘data’ and ‘storytelling’ in a broad sense, the book combines theory and practice around real-world data storytelling scenarios, offering critical reflection alongside practical and creative solutions to challenges in the data storytelling process, from tracking down hard to find information, to the ethics of visualising difficult subjects like death and human rights….(More)”.

Big data in official statistics


Paper by Barteld Braaksma and Kees Zeelenberg: “In this paper, we describe and discuss opportunities for big data in official statistics. Big data come in high volume, high velocity and high variety. Their high volume may lead to better accuracy and more details, their high velocity may lead to more frequent and more timely statistical estimates, and their high variety may give opportunities for statistics in new areas. But there are also many challenges: there are uncontrolled changes in sources that threaten continuity and comparability, and data that refer only indirectly to phenomena of statistical interest.

Furthermore, big data may be highly volatile and selective: the coverage of the population to which they refer may change from day to day, leading to inexplicable jumps in time-series. And very often, the individual observations in these big data sets lack variables that allow them to be linked to other datasets or population frames. This severely limits the possibilities for correction of selectivity and volatility. Also, with the advance of big data and open data, there is much more scope for disclosure of individual data, and this poses new problems for statistical institutes. So, big data may be regarded as so-called nonprobability samples. The use of such sources in official statistics requires other approaches than the traditional one based on surveys and censuses.

A first approach is to accept the big data just for what they are: an imperfect, yet very timely, indicator of developments in society. In a sense, this is what national statistical institutes (NSIs) often do: we collect data that have been assembled by the respondents and the reason why, and even just the fact that they have been assembled is very much the same reason why they are interesting for society and thus for an NSI to collect. In short, we might argue: these data exist and that’s why they are interesting.

A second approach is to use formal models and extract information from these data. In recent years, many new methods for dealing with big data have been developed by mathematical and applied statisticians. New methods like machine-learning techniques can be considered alongside more traditional methods like Bayesian techniques. National statistical institutes have always been reluctant to use models, apart from specific cases like small-area estimates. Based on experience at Statistics Netherlands, we argue that NSIs should not be afraid to use models, provided that their use is documented and made transparent to users. On the other hand, in official statistics, models should not be used for all kinds of purposes….(More)”.

Index: Secondary Uses of Personal Data


By Alexandra Shaw, Andrew Zahuranec, Andrew Young, Stefaan Verhulst

The Living Library Index–inspired by the Harper’s Index–provides important statistics and highlights global trends in governance innovation. This installment focuses on public perceptions regarding secondary uses of personal data (or the re-use of data initially collected for a different purpose). It provides a summary of societal perspectives toward personal data usage, sharing, and control. It is not meant to be comprehensive–rather, it intends to illustrate conflicting, and often confusing, attitudes toward the re-use of personal data. 

Please share any additional, illustrative statistics on data, or other issues at the nexus of technology and governance, with us at info@thelivinglib.org

Data ownership and control 

  • Percentage of Americans who say it is “very important” they control information collected about them: 74% – 2016
  • Americans who think that today’s privacy laws are not good enough at protecting people’s privacy online: 68% – 2016
  • Americans who say they have “a lot” of control over how companies collect and use their information: 9% – 2015
  • In a survey of 507 online shoppers, the number of respondents who indicated they don’t want brands tracking their location: 62% – 2015
  • In a survey of 507 online shoppers, the amount who “prefer offers that are targeted to where they are and what they are doing:” 60% – 2015 
  • Number of surveyed American consumers willing to provide data to corporations under the following conditions: 
    • “Data about my social concerns to better connect me with non-profit organizations that advance those causes:” 19% – 2018
    • “Data about my DNA to help me uncover any hereditary illnesses:” 21% – 2018
    • “Data about my interests and hobbies to receive relevant information and offers from online sellers:” 32% – 2018
    • “Data about my location to help me find the fastest route to my destination:” 40% – 2018
    • “My email address to receive exclusive offers from my favorite brands:”  56% – 2018  

Consumer Attitudes 

  • Academic study participants willing to donate personal data to research if it could lead to public good: 60% – 2014
  • Academic study participants willing to share personal data for research purposes in the interest of public good: 25% – 2014
  • Percentage who expect companies to “treat [them] like an individual, not as a member of some segment like ‘millennials’ or ‘suburban mothers:’” 74% – 2018 
    • Percentage who believe that brands should understand a “consumer’s individual situation (e.g. marital status, age, location, etc.)” when they’re being marketed to: 70% – 2018 Number who are “more annoyed” by companies now compared to 5 years ago: 40% – 2018Percentage worried their data is shared across companies without their permission: 88% – 2018Amount worried about a brand’s ability to track their behavior while on the brand’s website, app, or neither: 75% – 2018 
  • Consumers globally who expect brands to anticipate needs before they arise: 33%  – 2018 
  • Surveyed residents of the United Kingdom who identify as:
    • “Data pragmatists” willing to share personal data “under the right circumstances:” 58% – 2017
    • “Fundamentalists,” who would not share personal data for better services: 24% – 2017
    • Respondents who think data sharing is part of participating in the modern economy: 62% – 2018
    • Respondents who believe that data sharing benefits enterprises more than consumers: 75% – 2018
    • People who want more control over their data that enterprises collect: 84% – 2018
    • Percentage “unconcerned” about personal data protection: 18% – 2018
  • Percentage of Americans who think that government should do more to regulate large technology companies: 55% – 2018
  • Registered American voters who trust broadband companies with personal data “a great deal” or “a fair amount”: 43% – 2017
  • Americans who report experiencing a major data breach: 64% – 2017
  • Number of Americans who believe that their personal data is less secure than it was 5 years ago: 49% – 2019
  • Amount of surveyed American citizens who consider trust in a company an important factor for sharing data: 54% – 2018

Convenience

Microsoft’s 2015 Consumer Data Value Exchange Report attempts to understand consumer attitudes on the exchange of personal data across the global markets of Australia, Brazil, Canada, Colombia, Egypt, Germany, Kenya, Mexico, Nigeria, Spain, South Africa, United Kingdom and the United States. From their survey of 16,500 users, they find:

  • The most popular incentives for sharing data are: 
    • Cash rewards: 64% – 2015
    • Significant discounts: 49% – 2015
    • Streamlined processes: 29% – 2015
    • New ideas: 28% – 2015
  • Respondents who would prefer to see more ads to get new services: 34% – 2015
  • Respondents willing to share search terms for a service that enabled fewer steps to get things done: 70% – 2015 
  • Respondents willing to share activity data for such an improvement: 82% – 2015
  • Respondents willing to share their gender for “a service that inspires something new based on others like them:” 79% – 2015

A 2015 Pew Research Center survey presented Americans with several data-sharing scenarios related to convenience. Participants could respond: “acceptable,” “it depends,” or “not acceptable” to the following scenarios: 

  • Share health information to get access to personal health records and arrange appointments more easily:
    • Acceptable: 52% – 2015
    • It depends: 20% – 2015
    • Not acceptable: 26% – 2015
  • Share data for discounted auto insurance rates: 
    • Acceptable: 37% – 2015
    • It depends: 16% – 2015
    • Not acceptable: 45% – 2015
  • Share data for free social media services: 
    • Acceptable: 33% – 2015
    • It depends: 15% – 2015
    • Not acceptable: 51% – 2015
  • Share data on smart thermostats for cheaper energy bills: 
    • Acceptable: 33% – 2015
    • It depends: 15% – 2015
    • Not acceptable: 51% – 2015

Other Studies

  • Surveyed banking and insurance customers who would exchange personal data for:
    • Targeted auto insurance premiums: 64% – 2019
    • Better life insurance premiums for healthy lifestyle choices: 52% – 2019 
  • Surveyed banking and insurance customers willing to share data specifically related to income, location and lifestyle habits to: 
    • Secure faster loan approvals: 81.3% – 2019
    • Lower the chances of injury or loss: 79.7% – 2019 
    • Receive discounts on non-insurance products or services: 74.6% – 2019
    • Receive text alerts related to banking account activity: 59.8% – 2019 
    • Get saving advice based on spending patterns: 56.6% – 2019
  • In a survey of over 7,000 members of the public around the globe, respondents indicated:
    • They thought “smartphone and tablet apps used for navigation, chat, and news that can access your contacts, photos, and browsing history” is “creepy;” 16% – 2016
    • Emailing a friend about a trip to Paris and receiving advertisements for hotels, restaurants and excursions in Paris is “creepy:” 32% – 2016
    • A free fitness-tracking device that monitors your well-being and sends a monthly report to you and your employer is “creepy:” 45% – 2016
    • A telematics device that allows emergency services to track your vehicle is “creepy:” 78% – 2016
  • The number of British residents who do not want to work with virtual agents of any kind: 48% – 2017
  • Americans who disagree that “if companies give me a discount, it is a fair exchange for them to collect information about me without my knowing”: 91% – 2015

Data Brokers, Intermediaries, and Third Parties 

  • Americans who consider it acceptable for a grocery store to offer a free loyalty card in exchange for selling their shopping data to third parties: 47% – 2016
  • Number of people who know that “searches, site visits and purchases” are reviewed without consent:  55% – 2015
  • The number of people in 1991 who wanted companies to ask them for permission first before collecting their personal information and selling that data to intermediaries: 93% – 1991
    • Number of Americans who “would be very concerned if the company at which their data were stored sold it to another party:” 90% – 2008
    • Percentage of Americans who think it’s unacceptable for their grocery store to share their shopping data with third parties in exchange for a free loyalty card: 32% – 2016
  • Percentage of Americans who think that government needs to do more to regulate advertisers: 64% – 2016
    • Number of Americans who “want to have control over what marketers can learn about” them online: 84% – 2015
    • Percentage of Americans who think they have no power over marketers to figure out what they’re learning about them: 58% – 2015
  • Registered American voters who are “somewhat uncomfortable” or “very uncomfortable” with companies like Internet service providers or websites using personal data to recommend stories, articles, or videos:  56% – 2017
  • Registered American voters who are “somewhat uncomfortable” or “very uncomfortable” with companies like Internet service providers or websites selling their personal information to third parties for advertising purposes: 64% – 2017

Personal Health Data

The Robert Wood Johnson Foundation’s 2014 Health Data Exploration Project Report analyzes attitudes about personal health data (PHD). PHD is self-tracking data related to health that is traceable through wearable devices and sensors. The three major stakeholder groups involved in using PHD for public good are users, companies that track the users’ data, and researchers. 

  • Overall Respondents:
    • Percentage who believe anonymity is “very” or “extremely” important: 67% – 2014
    • Percentage who “probably would” or “definitely would” share their personal data with researchers: 78% – 2014
    • Percentage who believe that they own—or should own—all the data about them, even when it is indirectly collected: 54% – 2014
    • Percentage who think they share or ought to share ownership with the company: 30% – 2014
    • Percentage who think companies alone own or should own all the data about them: 4% – 2014
    • Percentage for whom data ownership “is not something I care about”: 13% – 2014
    • Percentage who indicated they wanted to own their data: 75% – 2014 
    • Percentage who would share data only if “privacy were assured:” 68% – 2014
    • People who would supply data regardless of privacy or compensation: 27% – 2014
      • Percentage of participants who mentioned privacy, anonymity, or confidentiality when asked under what conditions they would share their data:  63% – 2014
      • Percentage who would be “more” or “much more” likely to share data for compensation: 56% – 2014
      • Percentage who indicated compensation would make no difference: 38% – 2014
      • Amount opposed to commercial  or profit-making use of their data: 13% – 2014
    • Percentage of people who would only share personal health data with a guarantee of:
      • Privacy: 57% – 2014
      • Anonymization: 90% – 2014
  • Surveyed Researchers: 
    • Percentage who agree or strongly agree that self-tracking data would help provide more insights in their research: 89% – 2014
    • Percentage who say PHD could answer questions that other data sources could not: 95% – 2014
    • Percentage who have used public datasets: 57% – 2014
    • Percentage who have paid for data for research: 19% – 2014
    • Percentage who have used self-tracking data before for research purposes: 46% – 2014
    • Percentage who have worked with application, device, or social media companies: 23% – 2014
    • Percentage who “somewhat disagree” or “strongly disagree” there are barriers that cannot be overcome to using self-tracking data in their research: 82% – 2014 

SOURCES: 

“2019 Accenture Global Financial Services Consumer Study: Discover the Patterns in Personality”, Accenture, 2019. 

“Americans’ Views About Data Collection and Security”, Pew Research Center, 2015. 

“Data Donation: Sharing Personal Data for Public Good?”, ResearchGate, 2014.

Data privacy: What the consumer really thinks,” Acxiom, 2018.

“Exclusive: Public wants Big Tech regulated”, Axios, 2018.

Consumer data value exchange,” Microsoft, 2015.

Crossing the Line: Staying on the right side of consumer privacy,” KPMG International Cooperative, 2016.

“How do you feel about the government sharing our personal data? – livechat”, The Guardian, 2017. 

“Personal data for public good: using health information in medical research”, The Academy of Medical Sciences, 2006. 

“Personal Data for the Public Good: New Opportunities to Enrich Understanding of Individual and Population Health”, Robert Wood Johnson Foundation, Health Data Exploration Project, Calit2, UC Irvine and UC San Diego, 2014. 

“Pew Internet and American Life Project: Cloud Computing Raises Privacy Concerns”, Pew Research Center, 2008. 

“Poll: Little Trust That Tech Giants Will Keep Personal Data Private”, Morning Consult & Politico, 2017. 

“Privacy and Information Sharing”, Pew Research Center, 2016. 

“Privacy, Data and the Consumer: What US Thinks About Sharing Data”, MarTech Advisor, 2018. 

“Public Opinion on Privacy”, Electronic Privacy Information Center, 2019. 

“Selligent Marketing Cloud Study Finds Consumer Expectations and Marketer Challenges are Rising in Tandem”, Selligent Marketing Cloud, 2018. 

The Data-Sharing Disconnect: The Impact of Context, Consumer Trust, and Relevance in Retail Marketing,” Boxever, 2015. 

Microsoft Research reveals understanding gap in the brand-consumer data exchange,” Microsoft Research, 2015.

“Survey: 58% will share personal data under the right circumstances”, Marketing Land: Third Door Media, 2019. 

“The state of privacy in post-Snowden America”, Pew Research Center, 2016. 

The Tradeoff Fallacy: How Marketers Are Misrepresenting American Consumers And Opening Them Up to Exploitation”, University of Pennsylvania, 2015.

Index: The Data Universe 2019


By Michelle Winowatan, Andrew J. Zahuranec, Andrew Young, Stefaan Verhulst, Max Jun Kim

The Living Library Index – inspired by the Harper’s Index – provides important statistics and highlights global trends in governance innovation. This installment focuses on the data universe.

Please share any additional, illustrative statistics on data, or other issues at the nexus of technology and governance, with us at info@thelivinglib.org

Internet Traffic:

  • Percentage of the world’s population that uses the internet: 51.2% (3.9 billion people) – 2018
  • Number of search processed worldwide by Google every year: at least 2 trillion – 2016
  • Website traffic worldwide generated through mobile phones: 52.2% – 2018
  • The total number of mobile subscriptions in the first quarter of 2019: 7.9 billion (addition of 44 million in quarter) – 2019
  • Amount of mobile data traffic worldwide: nearly 30 billion GB – 2018
  • Data category with highest traffic worldwide: video (60%) – 2018
  • Global average of data traffic per smartphone per month: 5.6 GB – 2018
    • North America: 7 GB – 2018
    • Latin America: 3.1 GB – 2018
    • Western Europe: 6.7 GB – 2018
    • Central and Eastern Europe: 4.5 GB – 2018
    • North East Asia: 7.1 GB – 2018
    • Southeast Asia and Oceania: 3.6 GB – 2018
    • India, Nepal, and Bhutan: 9.8 GB – 2018
    • Middle East and Africa: 3.0 GB – 2018
  • Time between the creation of each new bitcoin block: 9.27 minutes – 2019

Streaming Services:

  • Total hours of video streamed by Netflix users every minute: 97,222 – 2017
  • Hours of YouTube watched per day: over 1 billion – 2018
  • Number of tracks uploaded to Spotify every day: Over 20,000 – 2019
  • Number of Spotify’s monthly active users: 232 million – 2019
  • Spotify’s total subscribers: 108 million – 2019
  • Spotify’s hours of content listened: 17 billion – 2019
  • Total number of songs on Spotify’s catalog: over 30 million – 2019
  • Apple Music’s total subscribers: 60 million – 2019
  • Total number of songs on Apple Music’s catalog: 45 million – 2019

Social Media:

Calls and Messaging:

Retail/Financial Transaction:

  • Number of packages shipped by Amazon in a year: 5 billion – 2017
  • Total value of payments processed by Venmo in a year: USD 62 billion – 2019
  • Based on an independent analysis of public transactions on Venmo in 2017:
  • Based on a non-representative survey of 2,436 US consumers between the ages of 21 and 72 on P2P platforms:
    • The average volume of transactions handled by Venmo: USD 64.2 billion – 2019
    • The average volume of transactions handled by Zelle: USD 122.0 billion – 2019
    • The average volume of transactions handled by PayPal: USD 141.8 billion – 2019 
    • Platform with the highest percent adoption among all consumers: PayPal (48%) – 2019 

Internet of Things:

Sources:

Government wants access to personal data while it pushes privacy


Sara Fischer and Scott Rosenberg at Axios: “Over the past two years, the U.S. government has tried to rein in how major tech companies use the personal data they’ve gathered on their customers. At the same time, government agencies are themselves seeking to harness those troves of data.

Why it matters: Tech platforms use personal information to target ads, whereas the government can use it to prevent and solve crimes, deliver benefits to citizens — or (illegally) target political dissent.

Driving the news: A new report from the Wall Street Journal details the ways in which family DNA testing sites like FamilyTreeDNA are pressured by the FBI to hand over customer data to help solve criminal cases using DNA.

  • The trend has privacy experts worried about the potential implications of the government having access to large pools of genetic data, even though many people whose data is included never agreed to its use for that purpose.

The FBI has particular interest in data from genetic and social media sites, because it could help solve crimes and protect the public.

  • For example, the FBI is “soliciting proposals from outside vendors for a contract to pull vast quantities of public data” from Facebook, Twitter Inc. and other social media companies,“ the Wall Street Journal reports.
  • The request is meant to help the agency surveil social behavior to “mitigate multifaceted threats, while ensuring all privacy and civil liberties compliance requirements are met.”
  • Meanwhile, the Trump administration has also urged social media platforms to cooperate with the governmentin efforts to flag individual users as potential mass shooters.

Other agencies have their eyes on big data troves as well.

  • Earlier this year, settlement talks between Facebook and the Department of Housing and Urban Development broke down over an advertising discrimination lawsuit when, according to a Facebook spokesperson, HUD “insisted on access to sensitive information — like user data — without adequate safeguards.”
  • HUD presumably wanted access to the data to ensure advertising discrimination wasn’t occurring on the platform, but it’s unclear whether the agency needed user data to be able to support that investigation….(More)”.

Future Studies and Counterfactual Analysis


Book by Theodore J. Gordon and Mariana Todorova: “In this volume, the authors contribute to futures research by placing the counterfactual question in the future tense. They explore the possible outcomes of future, and consider how future decisions are turning points that may produce different global outcomes. This book focuses on a dozen or so intractable issues that span politics, religion, and technology, each addressed in individual chapters. Until now, most scenarios written by futurists have been built on cause and effect narratives or depended on numerical models derived from historical relationships. In contrast, many of the scenarios written for this book are point descriptions of future discontinuities, a form allows more thought-provoking presentations. Ultimately, this book demonstrates that counterfactual thinking and point scenarios of discontinuities are new, groundbreaking tools for futurists….(More)”.

Index: Open Data


By Alexandra Shaw, Michelle Winowatan, Andrew Young, and Stefaan Verhulst

The Living Library Index – inspired by the Harper’s Index – provides important statistics and highlights global trends in governance innovation. This installment focuses on open data and was originally published in 2018.

Value and Impact

  • The projected year at which all 28+ EU member countries will have a fully operating open data portal: 2020

  • Between 2016 and 2020, the market size of open data in Europe is expected to increase by 36.9%, and reach this value by 2020: EUR 75.7 billion

Public Views on and Use of Open Government Data

  • Number of Americans who do not trust the federal government or social media sites to protect their data: Approximately 50%

  • Key findings from The Economist Intelligence Unit report on Open Government Data Demand:

    • Percentage of respondents who say the key reason why governments open up their data is to create greater trust between the government and citizens: 70%

    • Percentage of respondents who say OGD plays an important role in improving lives of citizens: 78%

    • Percentage of respondents who say OGD helps with daily decision making especially for transportation, education, environment: 53%

    • Percentage of respondents who cite lack of awareness about OGD and its potential use and benefits as the greatest barrier to usage: 50%

    • Percentage of respondents who say they lack access to usable and relevant data: 31%

    • Percentage of respondents who think they don’t have sufficient technical skills to use open government data: 25%

    • Percentage of respondents who feel the number of OGD apps available is insufficient, indicating an opportunity for app developers: 20%

    • Percentage of respondents who say OGD has the potential to generate economic value and new business opportunity: 61%

    • Percentage of respondents who say they don’t trust governments to keep data safe, protected, and anonymized: 19%

Efforts and Involvement

  • Time that’s passed since open government advocates convened to create a set of principles for open government data – the instance that started the open data government movement: 10 years

  • Countries participating in the Open Government Partnership today: 79 OGP participating countries and 20 subnational governments

  • Percentage of “open data readiness” in Europe according to European Data Portal: 72%

    • Open data readiness consists of four indicators which are presence of policy, national coordination, licensing norms, and use of data.

  • Number of U.S. cities with Open Data portals: 27

  • Number of governments who have adopted the International Open Data Charter: 62

  • Number of non-state organizations endorsing the International Open Data Charter: 57

  • Number of countries analyzed by the Open Data Index: 94

  • Number of Latin American countries that do not have open data portals as of 2017: 4 total – Belize, Guatemala, Honduras and Nicaragua

  • Number of cities participating in the Open Data Census: 39

Demand for Open Data

  • Open data demand measured by frequency of open government data use according to The Economist Intelligence Unit report:

    • Australia

      • Monthly: 15% of respondents

      • Quarterly: 22% of respondents

      • Annually: 10% of respondents

    • Finland

      • Monthly: 28% of respondents

      • Quarterly: 18% of respondents

      • Annually: 20% of respondents

    •  France

      • Monthly: 27% of respondents

      • Quarterly: 17% of respondents

      • Annually: 19% of respondents

        •  
    • India

      • Monthly: 29% of respondents

      • Quarterly: 20% of respondents

      • Annually: 10% of respondents

    • Singapore

      • Monthly: 28% of respondents

      • Quarterly: 15% of respondents

      • Annually: 17% of respondents 

    • UK

      • Monthly: 23% of respondents

      • Quarterly: 21% of respondents

      • Annually: 15% of respondents

    • US

      • Monthly: 16% of respondents

      • Quarterly: 15% of respondents

      • Annually: 20% of respondents

  • Number of FOIA requests received in the US for fiscal year 2017: 818,271

  • Number of FOIA request processed in the US for fiscal year 2017: 823,222

  • Distribution of FOIA requests in 2017 among top 5 agencies with highest number of request:

    • DHS: 45%

    • DOJ: 10%

    • NARA: 7%

    • DOD: 7%

    • HHS: 4%

Examining Datasets

  • Country with highest index score according to ODB Leaders Edition: Canada (76 out of 100)

  • Country with lowest index score according to ODB Leaders Edition: Sierra Leone (22 out of 100)

  • Number of datasets open in the top 30 governments according to ODB Leaders Edition: Fewer than 1 in 5

  • Average percentage of datasets that are open in the top 30 open data governments according to ODB Leaders Edition: 19%

  • Average percentage of datasets that are open in the top 30 open data governments according to ODB Leaders Edition by sector/subject:

    • Budget: 30%

    • Companies: 13%

    • Contracts: 27%

    • Crime: 17%

    • Education: 13%

    • Elections: 17%

    • Environment: 20%

    • Health: 17%

    • Land: 7%

    • Legislation: 13%

    • Maps: 20%

    • Spending: 13%

    • Statistics: 27%

    • Trade: 23%

    • Transport: 30%

  • Percentage of countries that release data on government spending according to ODB Leaders Edition: 13%

  • Percentage of government data that is updated at regular intervals according to ODB Leaders Edition: 74%

  • Number of datasets available through:

  • Number of datasets classed as “open” in 94 places worldwide analyzed by the Open Data Index: 11%

  • Percentage of open datasets in the Caribbean, according to Open Data Census: 7%

  • Number of companies whose data is available through OpenCorporates: 158,589,950

City Open Data

  • New York City

  • Singapore

    • Number of datasets published in Singapore: 1,480

    • Percentage of datasets with standardized format: 35%

    • Percentage of datasets made as raw as possible: 25%

  • Barcelona

    • Number of datasets published in Barcelona: 443

    • Open data demand in Barcelona measured by:

      • Number of unique sessions in the month of September 2018: 5,401

    • Quality of datasets published in Barcelona according to Tim Berners Lee 5-star Open Data: 3 stars

  • London

    • Number of datasets published in London: 762

    • Number of data requests since October 2014: 325

  • Bandung

    • Number of datasets published in Bandung: 1,417

  • Buenos Aires

    • Number of datasets published in Buenos Aires: 216

  • Dubai

    • Number of datasets published in Dubai: 267

  • Melbourne

    • Number of datasets published in Melbourne: 199

Sources

  • About OGP, Open Government Partnership. 2018.