Selected Readings on Data Portability


By Juliet McMurren, Andrew Young, and Stefaan G. Verhulst

As part of an ongoing effort to build a knowledge base for the field of improving governance through technology, The GovLab publishes a series of Selected Readings, which provide an annotated and curated collection of recommended works on themes such as open data, data collaboration, and civic technology.

In this edition, we explore selected literature on data portability.

To suggest additional readings on this or any other topic, please email info@thelivinglib.org. All our Selected Readings can be found here.

Context

Data today exists largely in silos, generating problems and inefficiencies for the individual, business and society at large. These include:

  • difficulty switching (data) between competitive service providers;
  • delays in sharing data for important societal research initiatives;
  • barriers for data innovators to reuse data that could generate insights to inform individuals’ decision making; and
  • inhibitions to scale data donation.

Data portability — the principle that individuals have a right to obtain, copy, and reuse their own personal data and to transfer it from one IT platform or service to another for their own purposes — is positioned as a solution to these problems. When fully implemented, it would make data liquid, giving individuals the ability to access their own data in a usable and transferable format, transfer it from one service provider to another, or donate data for research and enhanced data analysis by those working in the public interest.

Some companies, including Google, Apple, Twitter and Facebook, have sought to advance data portability through initiatives like the Data Transfer Project, an open source software project designed to facilitate data transmittals. Newly enacted data protection legislation such as Europe’s General Data Protection Regulation (2018) and the California Consumer Privacy Act (2018) give data holders a right to data portability. However, despite the legal and technical advances made, many questions toward scaling up data liquidity and portability responsibly and systematically remain. These new data rights have generated complex and as yet unanswered questions about the limits of data ownership, the implications for privacy, security and intellectual property rights, and the practicalities of how, when, and to whom data can be transferred.

In this edition of the GovLab’s Selected Readings series, we examine the emerging literature on data portability to provide a foundation for future work on the value proposition of data portability. Readings are listed in alphabetical order.

Selected readings

Cho, Daegon, Pedro Ferreira, and Rahul Telang, The Impact of Mobile Number Portability on Price and Consumer Welfare (2016)

  • In this paper, the authors analyze how Mobile Number Portability (MNP) — the ability for consumers to maintain their phone number when changing providers, thus reducing switching costs — affected the relationship between switching costs, market price and consumer surplus after it was introduced in most European countries in the early 2000s.
  • Theory holds that when switching costs are high, market leaders will enjoy a substantial advantage and are able to keep prices high. Policy makers will therefore attempt to decrease switching costs to intensify competition and reduce prices to consumers.
  • The study reviewed quarterly data from 47 wireless service providers in 15 EU countries between 1999 and 2006. The data showed that MNP simultaneously decreased market price by over four percent and increased consumer welfare by an average of at least €2.15 per person per quarter. This increase amounted to a total of €880 million per quarter across the 15 EU countries analyzed in this paper and accounted for 15 percent of the increase in consumer surplus observed over this time.

CtrlShift, Data Mobility: The data portability growth opportunity for the UK economy (2018)

  • Commissioned by the UK Department of Digital, Culture, Media and Sport (DCMS), this study was intended to identify the potential of personal data portability for the UK economy.
  • Its scope went beyond the legal right to data portability envisaged by the GDPR, to encompass the current state of personal data portability and mobility, requirements for safe and secure data sharing, and the potential economic benefits through stimulation of innovation, productivity and competition.
  • The report concludes that increased personal data mobility has the potential to be a vital stimulus for the development of the digital economy, driving growth by empowering individuals to make use of their own data and consent to others using it to create new data-driven services and technologies.
  • However, the report concludes that there are significant challenges to be overcome, and new risks to be addressed, before the value of personal data can be realized. Much personal data remains locked in organizational silos, and systemic issues related to data security and governance and the uneven sharing of benefits need to be resolved.

Data Guidance and Future of Privacy Forum, Comparing Privacy Laws: GDPR v. CCPA (2018)

  • This paper compares the provisions of the GDPR with those of the California Consumer Privacy Act (2018).
  • Both article 20 of the GDPR and section 1798 of the CCPA recognize a right to data portability. Both also confer on data subjects the right to receive data from controllers free of charge upon request, and oblige controllers to create mechanisms to provide subjects with their data in portable and reusable form so that it can be transmitted to third parties for reuse.
  • In the CCPA, the right to data portability is an extension of the right to access, and only confers on data subjects the right to apply for data collected within the past 12 months and have it delivered to them. The GDPR does not impose a time limit, and allows data to be transferred from one data controller to another, but limits the right to automatically collected personal data provided by the data subject themselves through consent or contract.

Data Transfer Project, Data Transfer Project Overview and Fundamentals (2018)

  • The paper presents an overview of the goals, principles, architecture, and system components of the Data Transfer Project. The intent of the DTP is to increase the number of services offering data portability and provide users with the ability to transfer data directly in and out of participating providers through systems that are easy and intuitive to use, private and secure, reciprocal between services, and focused on user data. The project, which is supported by Microsoft, Google, Twitter and Facebook, is an open-source initiative that encourages the participation of other providers to reduce the infrastructure burden on providers and users.
  • In addition to benefits to innovation, competition, and user choice, the authors point to benefits to security, through allowing users to backup, organize, or archive their data, recover from account hijacking, and retrieve their data from deprecated services.
  • The DTP’s remit was to test concepts and feasibility for the transfer of specific types of user data between online services using a system of adapters to transfer proprietary formats into canonical formats that can be used to transfer data while allowing providers to maintain control over the security of their service. While not resolving all formatting or support issues, this approach would allow substantial data portability and encourage ecosystem sustainability.

Deloitte, How to Flourish in an Uncertain Future: Open Banking(2017)

  • This report addresses the innovative and disruptive potential of open banking, in which data is shared between members of the banking ecosystem at the authorization of the customer, with the potential to increase competition and facilitate new products and services. In the resulting marketplace model, customers could use a single banking interface to access products from multiple players, from established banks to newcomers and fintechs.
  • The report’s authors identify significant threats to current banking models. Banks that failed to embrace open banking could be relegated to a secondary role as an infrastructure provider, while third parties — tech companies, fintech, and price comparison websites — take over the customer relationship.
  • The report identifies four overlapping operating models banks could adopt within an open banking model: full service providers, delivering proprietary products through their own interface with little or no third-party integration; utilities, which provide other players with infrastructure without customer-facing services; suppliers, which offer proprietary products through third-party interfaces; and interfaces,which provide distribution services through a marketplace interface. To retain market share, incumbents are likely to need to adopt a combination of these roles, offering their own products and services and those of third parties through their own and others’ interfaces.

Digital Competition Expert Panel Unlocking Digital Competition(2019)

  • This report captures the findings of the UK Digital Competition Expert Panel, which was tasked in 2018 with considering opportunities and challenges the digital economy might pose for competition and competition policy and to recommend any necessary changes. The panel focused on the impact of big players within the sector, appropriate responses to mergers or anticompetitive practices, and the impact on consumers.
  • The panel found that the digital economy is creating many benefits, but that digital markets are subject to tipping, in which emerging winners can scoop much of the market. This concentration can give rise to substantial costs, especially to consumers, and cannot be solved by competition alone. However, government policy and regulatory solutions have limitations, including the slowness of policy change, uneven enforcement and profound informational asymmetries between companies and government.
  • The panel proposed the creation of a digital markets unit that would be tasked with developing a code of competitive conduct, enabling greater personal data mobility and systems designed with open standards, and advancing access to non-personal data to reduce barriers to market entry.
  • The panel’s model of data mobility goes beyond data portability, which involves consumers being able to request and transfer their own data from one provider to another. Instead, the panel recommended empowering consumers to instigate transfers of data between a business and a third party in order to access price information, compare goods and services, or access tailored advice and recommendations. They point to open banking as an example of how this could function in practice.
  • It also proposed updating merger policy to make it more forward-looking to better protect consumers and innovation and preserve the competitiveness of the market. It recommended the creation of antitrust policy that would enable the implementation of interim measures to limit damage to competition while antitrust cases are in process.

Egan, Erin, Charting a Way Forward: Data Portability and Privacy(2019)

  • This white paper by Facebook’s VP and Chief Privacy Officer, Policy, represents an attempt to advance the conversation about the relationship between data portability, privacy, and data protection. The author sets out five key questions about data portability: what is it, whose and what data should be portable, how privacy should be protected in the context of portability, and where responsibility for data misuse or improper protection should lie.
  • The paper finds that definitions of data portability still remain imprecise, particularly with regard to the distinction between data portability and data transfer. In the interest of feasibility and a reasonable operational burden on providers, it proposes time limits on providers’ obligations to make observed data portable.
  • The paper concludes that there are strong arguments both for and against allowing users to port their social graph — the map of connections between that user and other users of the service — but that the key determinant should be a capacity to ensure the privacy of all users involved. Best-practice data portability protocols that would resolve current differences of approach as to what, how and by whom information should be made available would help promote broader portability, as would resolution of liability for misuse or data exposure.

Engels, Barbara, Data portability among online platforms (2016)

  • The article examines the effects on competition and innovation of data portability among online platforms such as search engines, online marketplaces, and social media, and how relations between users, data, and platform services change in an environment of data portability.
  • The paper finds that the benefits to competition and innovation of portability are greatest in two kinds of environments: first, where platforms offer complementary products and can realize synergistic benefits by sharing data; and secondly, where platforms offer substitute or rival products but the risk of anti-competitive behaviour is high, as for search engines.
  • It identifies privacy and security issues raised by data portability. Portability could, for example, allow an identity fraudster to misuse personal data across multiple platforms, compounding the harm they cause.
  • It also suggests that standards for data interoperability could act to reduce innovation in data technology, encouraging data controllers to continue to use outdated technologies in order to comply with inflexible, government-mandated standards.

Graef, Inge, Martin Husovec and Nadezhda Purtova, Data Portability and Data Control: Lessons for an Emerging Concept in EU Law (2018)

  • This paper situates the data portability right conferred by the GDPR within rights-based data protection law. The authors argue that the right to data portability should be seen as a new regulatory tool aimed at stimulating competition and innovation in data-driven markets.
  • The authors note the potential for conflicts between the right to data portability and the intellectual property rights of data holders, suggesting that the framers underestimated the potential impact of such conflicts on copyright, trade secrets and sui generis database law.
  • Given that the right to data portability is being replicated within consumer protection law and the regulation of non-personal data, the authors argue framers of these laws should consider the potential for conflict and the impact of such conflict on incentives to innovate.

Mohsen, Mona Omar and Hassan A. Aziz The Blue Button Project: Engaging Patients in Healthcare by a Click of a Button (2015)

  • This paper provides a literature review on the Blue Button initiative, an early data portability project which allows Americans to access, view or download their health records in a variety of formats.
  • Originally launched through the Department of Veterans’ Affairs in 2010, the Blue Button initiative had expanded to more than 500 organizations by 2014, when the Department of Health and Human Services launched the Blue Button Connector to facilitate both patient access and development of new tools.
  • The Blue Button has enabled the development of tools such as the Harvard-developed Growth-Tastic app, which allows parents to check their child’s growth by submitting their downloaded pediatric health data. Pharmacies across the US have also adopted the Blue Button to provide patients with access to their prescription history.

More than Data and Mission: Smart, Got Data? The Value of Energy Data to Customers (2016)

  • This report outlines the public value of the Green Button, a data protocol that provides customers with private and secure access to their energy use data collected by smart meters.
  • The authors outline how the use of the Green Button can help states meet their energy and climate goals by enabling them to structure renewables and other distributed energy resources (DER) such as energy efficiency, demand response, and solar photovoltaics. Access to granular, near real time data can encourage innovation among DER providers, facilitating the development of applications like “virtual energy audits” that identify efficiency opportunities, allowing customers to reduce costs through time-of-use pricing, and enabling the optimization of photovoltaic systems to meet peak demand.
  • Energy efficiency receives the greatest boost from initiatives like the Green Button, with studies showing energy savings of up to 18 percent when customers have access to their meter data. In addition to improving energy conservation, access to meter data could improve the efficiency of appliances by allowing devices to trigger sleep modes in response to data on usage or price. However, at the time of writing, problems with data portability and interoperability were preventing these benefits from being realized, at a cost of tens of millions of dollars.
  • The authors recommend that commissions require utilities to make usage data available to customers or authorized third parties in standardized formats as part of basic utility service, and tariff data to developers for use in smart appliances.

MyData, Understanding Data Operators (2020)

  • MyData is a global movement of data users, activists and developers with a common goal to empower individuals with their personal data to enable them and their communities to develop knowledge, make informed decisions and interact more consciously and efficiently.
  • This introductory paper presents the state of knowledge about data operators, trusted data intermediaries that provide infrastructure for human-centric personal data management and governance, including data sharing and transfer. The operator model allows data controllers to outsource issues of legal compliance with data portability requirements, while offering individual users a transparent and intuitive way to manage the data transfer process.
  • The paper examines use cases from 48 “proto-operators” from 15 countries who fulfil some of the functions of an operator, albeit at an early level of maturity. The paper finds that operators offer management of identity authentication, data transaction permissions, connections between services, value exchange, data model management, personal data transfer and storage, governance support, and logging and accountability. At the heart of these functions is the need for minimum standards of data interoperability.
  • The paper reviews governance frameworks from the general (legislative) to the specific (operators), and explores proto-operator business models. In keeping with an emerging field, business models are currently unclear and potentially unsustainable, and one of a number of areas, including interoperability requirements and governance frameworks, that must still be developed.

National Science and Technology Council Smart Disclosure and Consumer Decision Making: Report of the Task Force on Smart Disclosure (2013)

  • This report summarizes the work and findings of the 2011–2013 Task Force on Smart Disclosure: Information and Efficiency, an interagency body tasked with advancing smart disclosure, through which data is made more available and accessible to both consumers and innovators.
  • The Task Force recognized the capacity of smart disclosure to inform consumer choices, empower them through access to useful personal data, enable the creation of new tools, products and services, and promote efficiency and growth. It reviewed federal efforts to promote smart disclosure within sectors and in data types that crosscut sectors, such as location data, consumer feedback, enforcement and compliance data and unique identifiers. It also surveyed specific public-private partnerships on access to data, such as the Blue and Green Button and MyData initiatives in health, energy and education respectively.
  • The Task Force reviewed steps taken by the Federal Government to implement smart disclosure, including adoption of machine readable formats and standards for metadata, use of APIs, and making data available in an unstructured format rather than not releasing it at all. It also reviewed “choice engines” making use of the data to provide services to consumers across a range of sectors.
  • The Task Force recommended that smart disclosure should be a core component of efforts to institutionalize and operationalize open data practices, with agencies proactively identifying, tagging, and planning the release of candidate data. It also recommended that this be supported by a government-wide community of practice.

Nicholas, Gabriel Taking It With You: Platform Barriers to Entry and the Limits of Data Portability (2020)

  • This paper considers whether, as is often claimed, data portability offers a genuine solution to the lack of competition within the tech sector.
  • It concludes that current regulatory approaches to data portability, which focus on reducing switching costs through technical solutions such as one-off exports and API interoperability, are not sufficient to generate increased competition. This is because they fail to address other barriers to entry, including network effects, unique data access, and economies of scale.
  • The author proposes an alternative approach, which he terms collective portability, which would allow groups of users to coordinate the transfer of their data to a new platform. This model raises questions about how such collectives would make decisions regarding portability, but would enable new entrants to successfully target specific user groups and scale rapidly without having to reach users one by one.

OECD, Enhancing Access to and Sharing of Data: Reconciling Risks and Benefits for Data Re-use across Societies (2019)

  • This background paper to a 2017 expert workshop on risks and benefits of data reuse considers data portability as one strategy within a data openness continuum that also includes open data, market-based B2B contractual agreements, and restricted data-sharing agreements within research and data for social good applications.
  • It considers four rationales offered for data portability. These include empowering individuals towards the “informational self-determination” aspired to by GDPR, increased competition within digital and other markets through reductions in information asymmetries between individuals and providers, switching costs, and barriers to market entry; and facilitating increased data flows.
  • The report highlights the need for both syntactic and semantic interoperability standards to ensure data can be reused across systems, both of which may be fostered by increased rights to data portability. Data intermediaries have an important role to play in the development of these standards, through initiatives like the Data Transfer Project, a collaboration which brought together Facebook, Google, Microsoft, and Twitter to create an open-source data portability platform.

Personal Data Protection Commission Singapore Response to Feedback on the Public Consultation on Proposed Data Portability and Data Innovation Provisions (2020)

  • The report summarizes the findings of the 2019 PDPC public consultation on proposals to introduce provisions on data portability and data innovation in Singapore’s Personal Data Protection Act.
  • The proposed provision would oblige organizations to transmit an individual’s data to another organization in a commonly used machine-readable format, upon the individual’s request. The obligation does not extend to data intermediaries or organizations that do not have a presence in Singapore, although data holders may choose to honor those requests.
  • The obligation would apply to electronic data that is either provided by the individual or generated by the individual’s activities in using the organization’s service or product, but not derived data created by the processing of other data by the data holder. Respondents were concerned that including derived data could harm organizations’ competitiveness.
  • Respondents were concerned about how to honour data portability requests where the data of third parties was involved, as in the case of a joint account holder, for example. The PDPC opted for a “balanced, reasonable, and pragmatic approach,” allowing data involving third parties to be ported where it was under the requesting individual’s control, was to be used for domestic and personal purposes, and related only to the organization’s product or service.

Quinn, Paul Is the GDPR and Its Right to Data Portability a Major Enabler of Citizen Science? (2018)

  • This article explores the potential of data portability to advance citizen science by enabling participants to port their personal data from one research project to another. Citizen science — the collection and contribution of large amounts of data by private individuals for scientific research — has grown rapidly in response to the development of new digital means to capture, store, organize, analyze and share data.
  • The GDPR right to data portability aids citizen science by requiring transfer of data in machine-readable format and allowing data subjects to request its transfer to another data controller. This requirement of interoperability does not amount to compatibility, however, and data thus transferred would probably still require cleaning to be usable, acting as a disincentive to reuse.
  • The GDPR’s limitation of transferability to personal data provided by the data subject excludes some forms of data that might possess significant scientific potential, such as secondary personal data derived from further processing or analysis.
  • The GDPR right to data portability also potentially limits citizen science by restricting the grounds for processing data to which the right applies to data obtained through a subject’s express consent or through the performance of a contract. This limitation excludes other forms of data processing described in the GDPR, such as data processing for preventive or occupational medicine, scientific research, or archiving for reasons of public or scientific interest. It is also not clear whether the GDPR compels data controllers to transfer data outside the European Union.

Wong, Janis and Tristan Henderson, How Portable is Portable? Exercising the GDPR’s Right to Data Portability (2018)

  • This paper presents the results of 230 real-world requests for data portability in order to assess how — and how well — the GDPR right to data portability is being implemented. The authors were interested in establishing the kinds of file formats that were returned in response to requests, and to identify practical difficulties encountered in making and interpreting requests, over a three month period beginning on the day the GDPR came into effect.
  • The findings revealed continuing problems around ensuring portability for both data controllers and data subjects. Of the 230 requests, only 163 were successfully completed.
  • Data controllers frequently had difficulty understanding the requirements of GDPR, providing data in incomplete or inappropriate formats: only 40 percent of the files supplied were in a fully compliant format. Additionally, some data controllers were confused between the right to data portability and other rights conferred by the GDPR, such as the right to access or erasure.

Selected Readings on AI for Development


By Dominik Baumann, Jeremy Pesner, Alexandra Shaw, Stefaan Verhulst, Michelle Winowatan, Andrew Young, Andrew J. Zahuranec

As part of an ongoing effort to build a knowledge base for the field of improving governance through technology, The GovLab publishes a series of Selected Readings, which provide an annotated and curated collection of recommended works on themes such as open data, data collaboration, and civic technology. 

In this edition, we explore selected literature on AI and Development. This piece was developed in the context of The GovLab’s collaboration with Agence Française de Développement (AFD) on the use of emerging technology for development. To suggest additional readings on this or any other topic, please email info@thelivinglib.org. All our Selected Readings can be found here.

Context: In recent years, public discourse on artificial intelligence (AI) has focused on its potential for improving the way businesses, governments, and societies make (automated) decisions. Simultaneously, several AI initiatives have raised concerns about human rights, including the possibility of discrimination and privacy breaches. Between these two opposing perspectives is a discussion on how stakeholders can maximize the benefits of AI for society while minimizing the risks that might arise from the use of this technology.

While the majority of AI initiatives today come from the private sector, international development actors increasingly experiment with AI-enabled programs. These initiatives focus on, for example, climate modelling, urban mobility, and disease transmission. These early efforts demonstrate the promise of AI for supporting more efficient, targeted, and impactful development efforts. Yet, the intersection of AI and development remains nascent, and questions remain regarding how this emerging technology can deliver on its promise while mitigating risks to intended beneficiaries.

Readings are listed in alphabetical order.

2030Vision. AI and the Sustainable Development Goals: the State of Play

  • In broad language, this document for 2030Vision assesses AI research and initiatives and the Sustainable Development Goals (SDGs) to determine gaps and potential that can be further explored or scaled. 
  • It specifically reviews the current applications of AI in two SDG sectors, food/agriculture and healthcare.
  • The paper recommends enhancing multi-sector collaboration among businesses, governments, civil society, academia and others to ensure technology can best address the world’s most pressing challenges.

Andersen, Lindsey. Artificial Intelligence in International Development: Avoiding Ethical Pitfalls. Journal of Public & International Affairs (2019). 

  • Investigating the ethical implications of AI in the international development sector, the author argues that the involvement of many different stakeholders and AI-technology providers results in ethical issues concerning fairness and inclusion, transparency, explainability and accountability, data limitations, and privacy and security.
  • The author recommends the information communication technology for development (ICT4D) community adopt the Principles for Digital Development to ensure the ethical implementation of AI in international development projects.
  • The Principles of Digital Development include: 1) design with the user; 2) understand the ecosystem; 3) design for scale; 4) build for sustainability; 5) be data driven; 6) use open standards, open data, open source, and open innovation; and 7) reuse and improve.

Arun, Chinmayi. AI and the Global South: Designing for Other Worlds in Markus D. Dubber, Frank Pasquale, and Sunit Das (eds.), The Oxford Handbook of Ethics of AI, Oxford University Press, Forthcoming (2019).

  • This chapter interrogates the impact of AI’s application in the Global South and raises concerns about such initiatives.
  • Arun argues AI’s deployment in the Global South may result in discrimination, bias, oppression, exclusion, and bad design. She further argues it can be especially harmful to vulnerable communities in places that do not have strong respect for human rights.
  • The paper concludes by outlining the international human rights laws that can mitigate these risks. It stresses the importance of a human rights-centric, inclusive, empowering context-driven approach in the use of AI in the Global South.

Best, Michael. Artificial Intelligence (AI) for Development Series: Module on AI, Ethics and Society. International Telecommunications Union (2018). 

  • This working paper is intended to help ICT policymakers or regulators consider the ethical challenges that emerge within AI applications.
  • The author identifies a four-pronged framework of analysis (risks, rewards, connections, and key questions to consider) that can guide policymaking in the fields of: 1) livelihood and work; 2) diversity, non-discrimination and freedoms from bias; 3) data privacy and minimization; and 4) peace and security.
  • The paper also includes a table of policies and initiatives undertaken by national governments and tech companies around AI, along with the set of values (mentioned above) explicitly considered.

International Development Innovation Alliance (2019). Artificial Intelligence and International Development: An Introduction

  • Results for Development, a nonprofit organization working in the international development sector, developed a report in collaboration with the AI and Development Working Group within the International Development Innovation Alliance (IDIA). The report provides a brief overview of AI and how this technology may impact the international development sector.
  • The report provides examples of AI-powered applications and initiatives that support the SDGs, including eradicating hunger, promoting gender equality, and encouraging climate action.
  • It also provides a collection of supporting resources and case studies for development practitioners interested in using AI.

Paul, Amy, Craig Jolley, and Aubra Anthony. Reflecting the Past, Shaping the Future: Making AI Work for International Development. United States Agency for International Development (2018). 

  • This report outlines the potential of machine learning (ML) and artificial intelligence in supporting development strategy. It also details some of the common risks that can arise from the use of these technologies.
  • The document contains examples of ML and AI applications to support the development sector and recommends good practices in handling such technologies. 
  • It concludes by recommending broad, shared governance, using fair and balanced data, and ensuring local population and development practitioners remain involved in it.

Pincet, Arnaud, Shu Okabe, and Martin Pawelczyk. Linking Aid to the Sustainable Development Goals – a machine learning approach. OECD Development Co-operation Working Papers (2019). 

  • The authors apply ML and semantic analysis to data sourced from the OECD’s Creditor Reporting System to map aid funding to particular SDGs.
  • The researchers find “Good Health and Well-Being” as the most targeted SDG, what the researchers call the “SDG darling.”
  • The authors find that mapping relationships between the system and SDGs can help to ensure equitable funding across different goals.

Quinn, John, Vanessa Frias-Martinez, and Lakshminarayan Subramanian. Computational Sustainability and Artificial Intelligence in the Developing World. Association for the Advancement of Artificial Intelligence (2014). 

  • These researchers suggest three different areas—health, food security, and transportation—in which AI applications can uniquely benefit the developing world. The researchers argue the lack of technological infrastructure in these regions make AI especially useful and valuable, as it can efficiently analyze data and provide solutions.
  • It provides some examples of application within the three themes, including disease surveillance, identification of drought and agricultural trends, modeling of commuting patterns, and traffic congestion monitoring.

Smith, Matthew and Sujaya Neupane. Artificial intelligence and human development: toward a research agenda (2018).

  • The authors highlight potential beneficial applications for AI in a development context, including healthcare, agriculture, governance, education, and economic productivity.
  • They also discuss the risks and downsides of AI, which include the “black boxing” of algorithms, bias in decision making, potential for extreme surveillance, undermining democracy, potential for job and tax revenue loss, vulnerability to cybercrime, and unequal wealth gains towards the already-rich.
  • They recommend further research projects on these topics that are interdisciplinary, locally conducted, and designed to support practice and policy.

Tomašev, Nenad, et al. AI for social good: unlocking the opportunity for positive impact. Nature Communications (2020).

  • This paper takes stock of what the authors term the AI for Social Good movement (AI4SG), which “aims to establish interdisciplinary partnerships centred around AI applications towards SDGs.”  
  • Developed at a multidisciplinary expert seminar on the topic, the authors present 10 recommendations for creating successful AI4SG collaborations: “1) Expectations of what is possible with AI need to be well grounded. 2) There is value in simple solutions. 3) Applications of AI need to be inclusive and accessible, and reviewed at every stage for ethics and human rights compliance. 4) Goals and use cases should be clear and well-defined. 5) Deep, long-term partnerships are required to solve large problem successfully. 6) Planning needs to align incentives, and factor in the limitations of both communities. 7) Establishing and maintaining trust is key to overcoming organisational barriers. 8) Options for reducing the development cost of AI solutions should be explored. 9) Improving data readiness is key. 10) Data must be processed securely, with utmost respect for human rights and privacy.”

Vinuesa, Ricardo, et al. The role of artificial intelligence in achieving the Sustainable Development Goals. 

  • This report analyzes how AI can meet both the demands of some SDGs and also inhibit progress toward others. It highlights a critical research gap about the extent to which AI impacts sustainable development in the medium and long term. 
  • Through his analysis, Vinuesa claims AI has the potential to positively impact the environment, society, and the economy. However, AI can hinder these groups.
  • The authors recognize that although AI enables efficiency and productivity, it can also increase inequality and hinder achievements of the 2030 Agenda. Vinuesa and his co-authors suggest adequate policy formation and regulation are needed to ensure fast and equitable development of AI technologies that can address the SDGs. 

United Nations Education, Scientific and Cultural Organization (UNESCO) (2019). Artificial intelligence for Sustainable Development: Synthesis Report, Mobile Learning Week 2019

  • In this report, UNESCO assesses the findings from Mobile Learning Week (MLW) 2019. The three main conclusions were: 1) the world is facing a learning crisis; 2) education drives sustainable development; and 3) sustainable development can only be achieved if we harness the potential of AI. 
  • Questions around four major themes dominated the MLW 2019 sessions: 1) how to guarantee inclusive and equitable use of AI in education; 2) how to harness AI to improve learning; 3) how to increase skills development; and 4) how to ensure transparent and auditable use of education data. 
  • To move forward, UNESCO advocates for more international cooperation and stakeholder involvement, creation of education and AI standards, and development of national policies to address educational gaps and risks. 

The Data Storytelling Workbook


Book by Anna Feigenbaum and Aria Alamalhodaei: “From tracking down information to symbolising human experiences, this book is your guide to telling more effective, empathetic and evidence-based data stories.

Drawing on cross-disciplinary research and first-hand accounts of projects ranging from public health to housing justice, The Data Storytelling Workbook introduces key concepts, challenges and problem-solving strategies in the emerging field of data storytelling. Filled with practical exercises and activities, the workbook offers interactive training materials that can be used for teaching and professional development. By approaching both ‘data’ and ‘storytelling’ in a broad sense, the book combines theory and practice around real-world data storytelling scenarios, offering critical reflection alongside practical and creative solutions to challenges in the data storytelling process, from tracking down hard to find information, to the ethics of visualising difficult subjects like death and human rights….(More)”.

Big data in official statistics


Paper by Barteld Braaksma and Kees Zeelenberg: “In this paper, we describe and discuss opportunities for big data in official statistics. Big data come in high volume, high velocity and high variety. Their high volume may lead to better accuracy and more details, their high velocity may lead to more frequent and more timely statistical estimates, and their high variety may give opportunities for statistics in new areas. But there are also many challenges: there are uncontrolled changes in sources that threaten continuity and comparability, and data that refer only indirectly to phenomena of statistical interest.

Furthermore, big data may be highly volatile and selective: the coverage of the population to which they refer may change from day to day, leading to inexplicable jumps in time-series. And very often, the individual observations in these big data sets lack variables that allow them to be linked to other datasets or population frames. This severely limits the possibilities for correction of selectivity and volatility. Also, with the advance of big data and open data, there is much more scope for disclosure of individual data, and this poses new problems for statistical institutes. So, big data may be regarded as so-called nonprobability samples. The use of such sources in official statistics requires other approaches than the traditional one based on surveys and censuses.

A first approach is to accept the big data just for what they are: an imperfect, yet very timely, indicator of developments in society. In a sense, this is what national statistical institutes (NSIs) often do: we collect data that have been assembled by the respondents and the reason why, and even just the fact that they have been assembled is very much the same reason why they are interesting for society and thus for an NSI to collect. In short, we might argue: these data exist and that’s why they are interesting.

A second approach is to use formal models and extract information from these data. In recent years, many new methods for dealing with big data have been developed by mathematical and applied statisticians. New methods like machine-learning techniques can be considered alongside more traditional methods like Bayesian techniques. National statistical institutes have always been reluctant to use models, apart from specific cases like small-area estimates. Based on experience at Statistics Netherlands, we argue that NSIs should not be afraid to use models, provided that their use is documented and made transparent to users. On the other hand, in official statistics, models should not be used for all kinds of purposes….(More)”.

Index: Secondary Uses of Personal Data


By Alexandra Shaw, Andrew Zahuranec, Andrew Young, Stefaan Verhulst

The Living Library Index–inspired by the Harper’s Index–provides important statistics and highlights global trends in governance innovation. This installment focuses on public perceptions regarding secondary uses of personal data (or the re-use of data initially collected for a different purpose). It provides a summary of societal perspectives toward personal data usage, sharing, and control. It is not meant to be comprehensive–rather, it intends to illustrate conflicting, and often confusing, attitudes toward the re-use of personal data. 

Please share any additional, illustrative statistics on data, or other issues at the nexus of technology and governance, with us at info@thelivinglib.org

Data ownership and control 

  • Percentage of Americans who say it is “very important” they control information collected about them: 74% – 2016
  • Americans who think that today’s privacy laws are not good enough at protecting people’s privacy online: 68% – 2016
  • Americans who say they have “a lot” of control over how companies collect and use their information: 9% – 2015
  • In a survey of 507 online shoppers, the number of respondents who indicated they don’t want brands tracking their location: 62% – 2015
  • In a survey of 507 online shoppers, the amount who “prefer offers that are targeted to where they are and what they are doing:” 60% – 2015 
  • Number of surveyed American consumers willing to provide data to corporations under the following conditions: 
    • “Data about my social concerns to better connect me with non-profit organizations that advance those causes:” 19% – 2018
    • “Data about my DNA to help me uncover any hereditary illnesses:” 21% – 2018
    • “Data about my interests and hobbies to receive relevant information and offers from online sellers:” 32% – 2018
    • “Data about my location to help me find the fastest route to my destination:” 40% – 2018
    • “My email address to receive exclusive offers from my favorite brands:”  56% – 2018  

Consumer Attitudes 

  • Academic study participants willing to donate personal data to research if it could lead to public good: 60% – 2014
  • Academic study participants willing to share personal data for research purposes in the interest of public good: 25% – 2014
  • Percentage who expect companies to “treat [them] like an individual, not as a member of some segment like ‘millennials’ or ‘suburban mothers:’” 74% – 2018 
    • Percentage who believe that brands should understand a “consumer’s individual situation (e.g. marital status, age, location, etc.)” when they’re being marketed to: 70% – 2018 Number who are “more annoyed” by companies now compared to 5 years ago: 40% – 2018Percentage worried their data is shared across companies without their permission: 88% – 2018Amount worried about a brand’s ability to track their behavior while on the brand’s website, app, or neither: 75% – 2018 
  • Consumers globally who expect brands to anticipate needs before they arise: 33%  – 2018 
  • Surveyed residents of the United Kingdom who identify as:
    • “Data pragmatists” willing to share personal data “under the right circumstances:” 58% – 2017
    • “Fundamentalists,” who would not share personal data for better services: 24% – 2017
    • Respondents who think data sharing is part of participating in the modern economy: 62% – 2018
    • Respondents who believe that data sharing benefits enterprises more than consumers: 75% – 2018
    • People who want more control over their data that enterprises collect: 84% – 2018
    • Percentage “unconcerned” about personal data protection: 18% – 2018
  • Percentage of Americans who think that government should do more to regulate large technology companies: 55% – 2018
  • Registered American voters who trust broadband companies with personal data “a great deal” or “a fair amount”: 43% – 2017
  • Americans who report experiencing a major data breach: 64% – 2017
  • Number of Americans who believe that their personal data is less secure than it was 5 years ago: 49% – 2019
  • Amount of surveyed American citizens who consider trust in a company an important factor for sharing data: 54% – 2018

Convenience

Microsoft’s 2015 Consumer Data Value Exchange Report attempts to understand consumer attitudes on the exchange of personal data across the global markets of Australia, Brazil, Canada, Colombia, Egypt, Germany, Kenya, Mexico, Nigeria, Spain, South Africa, United Kingdom and the United States. From their survey of 16,500 users, they find:

  • The most popular incentives for sharing data are: 
    • Cash rewards: 64% – 2015
    • Significant discounts: 49% – 2015
    • Streamlined processes: 29% – 2015
    • New ideas: 28% – 2015
  • Respondents who would prefer to see more ads to get new services: 34% – 2015
  • Respondents willing to share search terms for a service that enabled fewer steps to get things done: 70% – 2015 
  • Respondents willing to share activity data for such an improvement: 82% – 2015
  • Respondents willing to share their gender for “a service that inspires something new based on others like them:” 79% – 2015

A 2015 Pew Research Center survey presented Americans with several data-sharing scenarios related to convenience. Participants could respond: “acceptable,” “it depends,” or “not acceptable” to the following scenarios: 

  • Share health information to get access to personal health records and arrange appointments more easily:
    • Acceptable: 52% – 2015
    • It depends: 20% – 2015
    • Not acceptable: 26% – 2015
  • Share data for discounted auto insurance rates: 
    • Acceptable: 37% – 2015
    • It depends: 16% – 2015
    • Not acceptable: 45% – 2015
  • Share data for free social media services: 
    • Acceptable: 33% – 2015
    • It depends: 15% – 2015
    • Not acceptable: 51% – 2015
  • Share data on smart thermostats for cheaper energy bills: 
    • Acceptable: 33% – 2015
    • It depends: 15% – 2015
    • Not acceptable: 51% – 2015

Other Studies

  • Surveyed banking and insurance customers who would exchange personal data for:
    • Targeted auto insurance premiums: 64% – 2019
    • Better life insurance premiums for healthy lifestyle choices: 52% – 2019 
  • Surveyed banking and insurance customers willing to share data specifically related to income, location and lifestyle habits to: 
    • Secure faster loan approvals: 81.3% – 2019
    • Lower the chances of injury or loss: 79.7% – 2019 
    • Receive discounts on non-insurance products or services: 74.6% – 2019
    • Receive text alerts related to banking account activity: 59.8% – 2019 
    • Get saving advice based on spending patterns: 56.6% – 2019
  • In a survey of over 7,000 members of the public around the globe, respondents indicated:
    • They thought “smartphone and tablet apps used for navigation, chat, and news that can access your contacts, photos, and browsing history” is “creepy;” 16% – 2016
    • Emailing a friend about a trip to Paris and receiving advertisements for hotels, restaurants and excursions in Paris is “creepy:” 32% – 2016
    • A free fitness-tracking device that monitors your well-being and sends a monthly report to you and your employer is “creepy:” 45% – 2016
    • A telematics device that allows emergency services to track your vehicle is “creepy:” 78% – 2016
  • The number of British residents who do not want to work with virtual agents of any kind: 48% – 2017
  • Americans who disagree that “if companies give me a discount, it is a fair exchange for them to collect information about me without my knowing”: 91% – 2015

Data Brokers, Intermediaries, and Third Parties 

  • Americans who consider it acceptable for a grocery store to offer a free loyalty card in exchange for selling their shopping data to third parties: 47% – 2016
  • Number of people who know that “searches, site visits and purchases” are reviewed without consent:  55% – 2015
  • The number of people in 1991 who wanted companies to ask them for permission first before collecting their personal information and selling that data to intermediaries: 93% – 1991
    • Number of Americans who “would be very concerned if the company at which their data were stored sold it to another party:” 90% – 2008
    • Percentage of Americans who think it’s unacceptable for their grocery store to share their shopping data with third parties in exchange for a free loyalty card: 32% – 2016
  • Percentage of Americans who think that government needs to do more to regulate advertisers: 64% – 2016
    • Number of Americans who “want to have control over what marketers can learn about” them online: 84% – 2015
    • Percentage of Americans who think they have no power over marketers to figure out what they’re learning about them: 58% – 2015
  • Registered American voters who are “somewhat uncomfortable” or “very uncomfortable” with companies like Internet service providers or websites using personal data to recommend stories, articles, or videos:  56% – 2017
  • Registered American voters who are “somewhat uncomfortable” or “very uncomfortable” with companies like Internet service providers or websites selling their personal information to third parties for advertising purposes: 64% – 2017

Personal Health Data

The Robert Wood Johnson Foundation’s 2014 Health Data Exploration Project Report analyzes attitudes about personal health data (PHD). PHD is self-tracking data related to health that is traceable through wearable devices and sensors. The three major stakeholder groups involved in using PHD for public good are users, companies that track the users’ data, and researchers. 

  • Overall Respondents:
    • Percentage who believe anonymity is “very” or “extremely” important: 67% – 2014
    • Percentage who “probably would” or “definitely would” share their personal data with researchers: 78% – 2014
    • Percentage who believe that they own—or should own—all the data about them, even when it is indirectly collected: 54% – 2014
    • Percentage who think they share or ought to share ownership with the company: 30% – 2014
    • Percentage who think companies alone own or should own all the data about them: 4% – 2014
    • Percentage for whom data ownership “is not something I care about”: 13% – 2014
    • Percentage who indicated they wanted to own their data: 75% – 2014 
    • Percentage who would share data only if “privacy were assured:” 68% – 2014
    • People who would supply data regardless of privacy or compensation: 27% – 2014
      • Percentage of participants who mentioned privacy, anonymity, or confidentiality when asked under what conditions they would share their data:  63% – 2014
      • Percentage who would be “more” or “much more” likely to share data for compensation: 56% – 2014
      • Percentage who indicated compensation would make no difference: 38% – 2014
      • Amount opposed to commercial  or profit-making use of their data: 13% – 2014
    • Percentage of people who would only share personal health data with a guarantee of:
      • Privacy: 57% – 2014
      • Anonymization: 90% – 2014
  • Surveyed Researchers: 
    • Percentage who agree or strongly agree that self-tracking data would help provide more insights in their research: 89% – 2014
    • Percentage who say PHD could answer questions that other data sources could not: 95% – 2014
    • Percentage who have used public datasets: 57% – 2014
    • Percentage who have paid for data for research: 19% – 2014
    • Percentage who have used self-tracking data before for research purposes: 46% – 2014
    • Percentage who have worked with application, device, or social media companies: 23% – 2014
    • Percentage who “somewhat disagree” or “strongly disagree” there are barriers that cannot be overcome to using self-tracking data in their research: 82% – 2014 

SOURCES: 

“2019 Accenture Global Financial Services Consumer Study: Discover the Patterns in Personality”, Accenture, 2019. 

“Americans’ Views About Data Collection and Security”, Pew Research Center, 2015. 

“Data Donation: Sharing Personal Data for Public Good?”, ResearchGate, 2014.

Data privacy: What the consumer really thinks,” Acxiom, 2018.

“Exclusive: Public wants Big Tech regulated”, Axios, 2018.

Consumer data value exchange,” Microsoft, 2015.

Crossing the Line: Staying on the right side of consumer privacy,” KPMG International Cooperative, 2016.

“How do you feel about the government sharing our personal data? – livechat”, The Guardian, 2017. 

“Personal data for public good: using health information in medical research”, The Academy of Medical Sciences, 2006. 

“Personal Data for the Public Good: New Opportunities to Enrich Understanding of Individual and Population Health”, Robert Wood Johnson Foundation, Health Data Exploration Project, Calit2, UC Irvine and UC San Diego, 2014. 

“Pew Internet and American Life Project: Cloud Computing Raises Privacy Concerns”, Pew Research Center, 2008. 

“Poll: Little Trust That Tech Giants Will Keep Personal Data Private”, Morning Consult & Politico, 2017. 

“Privacy and Information Sharing”, Pew Research Center, 2016. 

“Privacy, Data and the Consumer: What US Thinks About Sharing Data”, MarTech Advisor, 2018. 

“Public Opinion on Privacy”, Electronic Privacy Information Center, 2019. 

“Selligent Marketing Cloud Study Finds Consumer Expectations and Marketer Challenges are Rising in Tandem”, Selligent Marketing Cloud, 2018. 

The Data-Sharing Disconnect: The Impact of Context, Consumer Trust, and Relevance in Retail Marketing,” Boxever, 2015. 

Microsoft Research reveals understanding gap in the brand-consumer data exchange,” Microsoft Research, 2015.

“Survey: 58% will share personal data under the right circumstances”, Marketing Land: Third Door Media, 2019. 

“The state of privacy in post-Snowden America”, Pew Research Center, 2016. 

The Tradeoff Fallacy: How Marketers Are Misrepresenting American Consumers And Opening Them Up to Exploitation”, University of Pennsylvania, 2015.

Index: The Data Universe 2019


By Michelle Winowatan, Andrew J. Zahuranec, Andrew Young, Stefaan Verhulst, Max Jun Kim

The Living Library Index – inspired by the Harper’s Index – provides important statistics and highlights global trends in governance innovation. This installment focuses on the data universe.

Please share any additional, illustrative statistics on data, or other issues at the nexus of technology and governance, with us at info@thelivinglib.org

Internet Traffic:

  • Percentage of the world’s population that uses the internet: 51.2% (3.9 billion people) – 2018
  • Number of search processed worldwide by Google every year: at least 2 trillion – 2016
  • Website traffic worldwide generated through mobile phones: 52.2% – 2018
  • The total number of mobile subscriptions in the first quarter of 2019: 7.9 billion (addition of 44 million in quarter) – 2019
  • Amount of mobile data traffic worldwide: nearly 30 billion GB – 2018
  • Data category with highest traffic worldwide: video (60%) – 2018
  • Global average of data traffic per smartphone per month: 5.6 GB – 2018
    • North America: 7 GB – 2018
    • Latin America: 3.1 GB – 2018
    • Western Europe: 6.7 GB – 2018
    • Central and Eastern Europe: 4.5 GB – 2018
    • North East Asia: 7.1 GB – 2018
    • Southeast Asia and Oceania: 3.6 GB – 2018
    • India, Nepal, and Bhutan: 9.8 GB – 2018
    • Middle East and Africa: 3.0 GB – 2018
  • Time between the creation of each new bitcoin block: 9.27 minutes – 2019

Streaming Services:

  • Total hours of video streamed by Netflix users every minute: 97,222 – 2017
  • Hours of YouTube watched per day: over 1 billion – 2018
  • Number of tracks uploaded to Spotify every day: Over 20,000 – 2019
  • Number of Spotify’s monthly active users: 232 million – 2019
  • Spotify’s total subscribers: 108 million – 2019
  • Spotify’s hours of content listened: 17 billion – 2019
  • Total number of songs on Spotify’s catalog: over 30 million – 2019
  • Apple Music’s total subscribers: 60 million – 2019
  • Total number of songs on Apple Music’s catalog: 45 million – 2019

Social Media:

Calls and Messaging:

Retail/Financial Transaction:

  • Number of packages shipped by Amazon in a year: 5 billion – 2017
  • Total value of payments processed by Venmo in a year: USD 62 billion – 2019
  • Based on an independent analysis of public transactions on Venmo in 2017:
  • Based on a non-representative survey of 2,436 US consumers between the ages of 21 and 72 on P2P platforms:
    • The average volume of transactions handled by Venmo: USD 64.2 billion – 2019
    • The average volume of transactions handled by Zelle: USD 122.0 billion – 2019
    • The average volume of transactions handled by PayPal: USD 141.8 billion – 2019 
    • Platform with the highest percent adoption among all consumers: PayPal (48%) – 2019 

Internet of Things:

Sources:

Government wants access to personal data while it pushes privacy


Sara Fischer and Scott Rosenberg at Axios: “Over the past two years, the U.S. government has tried to rein in how major tech companies use the personal data they’ve gathered on their customers. At the same time, government agencies are themselves seeking to harness those troves of data.

Why it matters: Tech platforms use personal information to target ads, whereas the government can use it to prevent and solve crimes, deliver benefits to citizens — or (illegally) target political dissent.

Driving the news: A new report from the Wall Street Journal details the ways in which family DNA testing sites like FamilyTreeDNA are pressured by the FBI to hand over customer data to help solve criminal cases using DNA.

  • The trend has privacy experts worried about the potential implications of the government having access to large pools of genetic data, even though many people whose data is included never agreed to its use for that purpose.

The FBI has particular interest in data from genetic and social media sites, because it could help solve crimes and protect the public.

  • For example, the FBI is “soliciting proposals from outside vendors for a contract to pull vast quantities of public data” from Facebook, Twitter Inc. and other social media companies,“ the Wall Street Journal reports.
  • The request is meant to help the agency surveil social behavior to “mitigate multifaceted threats, while ensuring all privacy and civil liberties compliance requirements are met.”
  • Meanwhile, the Trump administration has also urged social media platforms to cooperate with the governmentin efforts to flag individual users as potential mass shooters.

Other agencies have their eyes on big data troves as well.

  • Earlier this year, settlement talks between Facebook and the Department of Housing and Urban Development broke down over an advertising discrimination lawsuit when, according to a Facebook spokesperson, HUD “insisted on access to sensitive information — like user data — without adequate safeguards.”
  • HUD presumably wanted access to the data to ensure advertising discrimination wasn’t occurring on the platform, but it’s unclear whether the agency needed user data to be able to support that investigation….(More)”.

Future Studies and Counterfactual Analysis


Book by Theodore J. Gordon and Mariana Todorova: “In this volume, the authors contribute to futures research by placing the counterfactual question in the future tense. They explore the possible outcomes of future, and consider how future decisions are turning points that may produce different global outcomes. This book focuses on a dozen or so intractable issues that span politics, religion, and technology, each addressed in individual chapters. Until now, most scenarios written by futurists have been built on cause and effect narratives or depended on numerical models derived from historical relationships. In contrast, many of the scenarios written for this book are point descriptions of future discontinuities, a form allows more thought-provoking presentations. Ultimately, this book demonstrates that counterfactual thinking and point scenarios of discontinuities are new, groundbreaking tools for futurists….(More)”.

Index: Open Data


By Alexandra Shaw, Michelle Winowatan, Andrew Young, and Stefaan Verhulst

The Living Library Index – inspired by the Harper’s Index – provides important statistics and highlights global trends in governance innovation. This installment focuses on open data and was originally published in 2018.

Value and Impact

  • The projected year at which all 28+ EU member countries will have a fully operating open data portal: 2020

  • Between 2016 and 2020, the market size of open data in Europe is expected to increase by 36.9%, and reach this value by 2020: EUR 75.7 billion

Public Views on and Use of Open Government Data

  • Number of Americans who do not trust the federal government or social media sites to protect their data: Approximately 50%

  • Key findings from The Economist Intelligence Unit report on Open Government Data Demand:

    • Percentage of respondents who say the key reason why governments open up their data is to create greater trust between the government and citizens: 70%

    • Percentage of respondents who say OGD plays an important role in improving lives of citizens: 78%

    • Percentage of respondents who say OGD helps with daily decision making especially for transportation, education, environment: 53%

    • Percentage of respondents who cite lack of awareness about OGD and its potential use and benefits as the greatest barrier to usage: 50%

    • Percentage of respondents who say they lack access to usable and relevant data: 31%

    • Percentage of respondents who think they don’t have sufficient technical skills to use open government data: 25%

    • Percentage of respondents who feel the number of OGD apps available is insufficient, indicating an opportunity for app developers: 20%

    • Percentage of respondents who say OGD has the potential to generate economic value and new business opportunity: 61%

    • Percentage of respondents who say they don’t trust governments to keep data safe, protected, and anonymized: 19%

Efforts and Involvement

  • Time that’s passed since open government advocates convened to create a set of principles for open government data – the instance that started the open data government movement: 10 years

  • Countries participating in the Open Government Partnership today: 79 OGP participating countries and 20 subnational governments

  • Percentage of “open data readiness” in Europe according to European Data Portal: 72%

    • Open data readiness consists of four indicators which are presence of policy, national coordination, licensing norms, and use of data.

  • Number of U.S. cities with Open Data portals: 27

  • Number of governments who have adopted the International Open Data Charter: 62

  • Number of non-state organizations endorsing the International Open Data Charter: 57

  • Number of countries analyzed by the Open Data Index: 94

  • Number of Latin American countries that do not have open data portals as of 2017: 4 total – Belize, Guatemala, Honduras and Nicaragua

  • Number of cities participating in the Open Data Census: 39

Demand for Open Data

  • Open data demand measured by frequency of open government data use according to The Economist Intelligence Unit report:

    • Australia

      • Monthly: 15% of respondents

      • Quarterly: 22% of respondents

      • Annually: 10% of respondents

    • Finland

      • Monthly: 28% of respondents

      • Quarterly: 18% of respondents

      • Annually: 20% of respondents

    •  France

      • Monthly: 27% of respondents

      • Quarterly: 17% of respondents

      • Annually: 19% of respondents

        •  
    • India

      • Monthly: 29% of respondents

      • Quarterly: 20% of respondents

      • Annually: 10% of respondents

    • Singapore

      • Monthly: 28% of respondents

      • Quarterly: 15% of respondents

      • Annually: 17% of respondents 

    • UK

      • Monthly: 23% of respondents

      • Quarterly: 21% of respondents

      • Annually: 15% of respondents

    • US

      • Monthly: 16% of respondents

      • Quarterly: 15% of respondents

      • Annually: 20% of respondents

  • Number of FOIA requests received in the US for fiscal year 2017: 818,271

  • Number of FOIA request processed in the US for fiscal year 2017: 823,222

  • Distribution of FOIA requests in 2017 among top 5 agencies with highest number of request:

    • DHS: 45%

    • DOJ: 10%

    • NARA: 7%

    • DOD: 7%

    • HHS: 4%

Examining Datasets

  • Country with highest index score according to ODB Leaders Edition: Canada (76 out of 100)

  • Country with lowest index score according to ODB Leaders Edition: Sierra Leone (22 out of 100)

  • Number of datasets open in the top 30 governments according to ODB Leaders Edition: Fewer than 1 in 5

  • Average percentage of datasets that are open in the top 30 open data governments according to ODB Leaders Edition: 19%

  • Average percentage of datasets that are open in the top 30 open data governments according to ODB Leaders Edition by sector/subject:

    • Budget: 30%

    • Companies: 13%

    • Contracts: 27%

    • Crime: 17%

    • Education: 13%

    • Elections: 17%

    • Environment: 20%

    • Health: 17%

    • Land: 7%

    • Legislation: 13%

    • Maps: 20%

    • Spending: 13%

    • Statistics: 27%

    • Trade: 23%

    • Transport: 30%

  • Percentage of countries that release data on government spending according to ODB Leaders Edition: 13%

  • Percentage of government data that is updated at regular intervals according to ODB Leaders Edition: 74%

  • Number of datasets available through:

  • Number of datasets classed as “open” in 94 places worldwide analyzed by the Open Data Index: 11%

  • Percentage of open datasets in the Caribbean, according to Open Data Census: 7%

  • Number of companies whose data is available through OpenCorporates: 158,589,950

City Open Data

  • New York City

  • Singapore

    • Number of datasets published in Singapore: 1,480

    • Percentage of datasets with standardized format: 35%

    • Percentage of datasets made as raw as possible: 25%

  • Barcelona

    • Number of datasets published in Barcelona: 443

    • Open data demand in Barcelona measured by:

      • Number of unique sessions in the month of September 2018: 5,401

    • Quality of datasets published in Barcelona according to Tim Berners Lee 5-star Open Data: 3 stars

  • London

    • Number of datasets published in London: 762

    • Number of data requests since October 2014: 325

  • Bandung

    • Number of datasets published in Bandung: 1,417

  • Buenos Aires

    • Number of datasets published in Buenos Aires: 216

  • Dubai

    • Number of datasets published in Dubai: 267

  • Melbourne

    • Number of datasets published in Melbourne: 199

Sources

  • About OGP, Open Government Partnership. 2018.  

Selected Readings on Data Responsibility, Refugees and Migration


By Kezia Paladina, Alexandra Shaw, Michelle Winowatan, Stefaan Verhulst, and Andrew Young

The Living Library’s Selected Readings series seeks to build a knowledge base on innovative approaches for improving the effectiveness and legitimacy of governance. This curated and annotated collection of recommended works on the topic of Data Collaboration for Migration was originally published in 2018.

Special thanks to Paul Currion whose data responsibility literature review gave us a headstart when developing the below. (Check out his article listed below on Refugee Identity)

The collection below is also meant to complement our article in the Stanford Social Innovation Review on Data Collaboration for Migration where we emphasize the need for a Data Responsibility Framework moving forward.

From climate change to politics to finance, there is growing recognition that some of the most intractable problems of our era are information problems. In recent years, the ongoing refugee crisis has increased the call for new data-driven approaches to address the many challenges and opportunities arising from migration. While data – including data from the private sector – holds significant potential value for informing analysis and targeted international and humanitarian response to (forced) migration, decision-makers often lack an actionable understanding of if, when and how data could be collected, processed, stored, analyzed, used, and shared in a responsible manner.

Data responsibility – including the responsibility to protect data and shield its subjects from harms, and the responsibility to leverage and share data when it can provide public value – is an emerging field seeking to go beyond just privacy concerns. The forced migration arena has a number of particularly important issues impacting responsible data approaches, including the risks of leveraging data regarding individuals fleeing a hostile or repressive government.

In this edition of the GovLab’s Selected Readings series, we examine the emerging literature on the data responsibility approaches in the refugee and forced migration space – part of an ongoing series focused on Data Responsibiltiy. The below reading list features annotated readings related to the Policy and Practice of data responsibility for refugees, and the specific responsibility challenges regarding Identity and Biometrics.

Data Responsibility and Refugees – Policy and Practice

International Organization for Migration (IOM) (2010) IOM Data Protection Manual. Geneva: IOM.

  • This IOM manual includes 13 data protection principles related to the following activities: lawful and fair collection, specified and legitimate purpose, data quality, consent, transfer to third parties, confidentiality, access and transparency, data security, retention and personal data, application of the principles, ownership of personal data, oversight, compliance and internal remedies (and exceptions).
  • For each principle, the IOM manual features targeted data protection guidelines, and templates and checklists are included to help foster practical application.

Norwegian Refugee Council (NRC) Internal Displacement Monitoring Centre / OCHA (eds.) (2008) Guidance on Profiling Internally Displaced Persons. Geneva: Inter-Agency Standing Committee.

  • This NRC document contains guidelines on gathering better data on Internally Displaced Persons (IDPs), based on country context.
  • IDP profile is defined as number of displaced persons, location, causes of displacement, patterns of displacement, and humanitarian needs among others.
  • It further states that collecting IDPs data is challenging and the current condition of IDPs data are hampering assistance programs.
  • Chapter I of the document explores the rationale for IDP profiling. Chapter II describes the who aspect of profiling: who IDPs are and common pitfalls in distinguishing them from other population groups. Chapter III describes the different methodologies that can be used in different contexts and suggesting some of the advantages and disadvantages of each, what kind of information is needed and when it is appropriate to profile.

United Nations High Commissioner for Refugees (UNHCR). Model agreement on the sharing of personal data with Governments in the context of hand-over of the refugee status determination process. Geneva: UNHCR.

  • This document from UNHCR provides a template of agreement guiding the sharing of data between a national government and UNHCR. The model agreement’s guidance is aimed at protecting the privacy and confidentiality of individual data while promoting improvements to service delivery for refugees.

United Nations High Commissioner for Refugees (UNHCR) (2015). Policy on the Protection of Personal Data of Persons of Concern to UNHCR. Geneva: UNHCR.

  • This policy outlines the rules and principles regarding the processing of personal data of persons engaged by UNHCR with the purpose of ensuring that the practice is consistent with UNGA’s regulation of computerized personal data files that was established to protect individuals’ data and privacy.
  • UNHCR require its personnel to apply the following principles when processing personal data: (i) Legitimate and fair processing (ii) Purpose specification (iii) Necessity and proportionality (iv) Accuracy (v) Respect for the rights of the data subject (vi) Confidentiality (vii) Security (viii) Accountability and supervision.

United Nations High Commissioner for Refugees (UNHCR) (2015) Privacy Impact Assessment of UNHCR Cash Based Interventions.

  • This impact assessment focuses on privacy issues related to financial assistance for refugees in the form of cash transfers. For international organizations like UNHCR to determine eligibility for cash assistance, data “aggregation, profiling, and social sorting techniques,” are often needed, leading a need for a responsible data approach.
  • This Privacy Impact Assessment (PIA) aims to identify the privacy risks posed by their program and seek to enhance safeguards that can mitigate those risks.
  • Key issues raised in the PIA involves the challenge of ensuring that individuals’ data will not be used for purposes other than those initially specified.

Data Responsibility in Identity and Biometrics

Bohlin, A. (2008) “Protection at the Cost of Privacy? A Study of the Biometric Registration of Refugees.” Lund: Faculty of Law of the University of Lund.

  • This 2008 study focuses on the systematic biometric registration of refugees conducted by UNHCR in refugee camps around the world, to understand whether enhancing the registration mechanism of refugees contributes to their protection and guarantee of human rights, or whether refugee registration exposes people to invasions of privacy.
  • Bohlin found that, at the time, UNHCR failed to put a proper safeguards in the case of data dissemination, exposing the refugees data to the risk of being misused. She goes on to suggest data protection regulations that could be put in place in order to protect refugees’ privacy.

Currion, Paul. (2018) “The Refugee Identity.” Medium.

  • Developed as part of a DFID-funded initiative, this essay considers Data Requirements for Service Delivery within Refugee Camps, with a particular focus on refugee identity.
  • Among other findings, Currion finds that since “the digitisation of aid has already begun…aid agencies must therefore pay more attention to the way in which identity systems affect the lives and livelihoods of the forcibly displaced, both positively and negatively.”
  • Currion argues that a Responsible Data approach, as opposed to a process defined by a Data Minimization principle, provides “useful guidelines,” but notes that data responsibility “still needs to be translated into organisational policy, then into institutional processes, and finally into operational practice.”

Farraj, A. (2010) “Refugees and the Biometric Future: The Impact of Biometrics on Refugees and Asylum Seekers.” Colum. Hum. Rts. L. Rev. 42 (2010): 891.

  • This article argues that biometrics help refugees and asylum seekers establish their identity, which is important for ensuring the protection of their rights and service delivery.
  • However, Farraj also describes several risks related to biometrics, such as, misidentification and misuse of data, leading to a need for proper approaches for the collection, storage, and utilization of the biometric information by government, international organizations, or other parties.  

GSMA (2017) Landscape Report: Mobile Money, Humanitarian Cash Transfers and Displaced Populations. London: GSMA.

  • This paper from GSMA seeks to evaluate how mobile technology can be helpful in refugee registration, cross-organizational data sharing, and service delivery processes.
  • One of its assessments is that the use of mobile money in a humanitarian context depends on the supporting regulatory environment that contributes to unlocking the true potential of mobile money. The examples include extension of SIM dormancy period to anticipate infrequent cash disbursements, ensuring that persons without identification are able to use the mobile money services, and so on.
  • Additionally, GMSA argues that mobile money will be most successful when there is an ecosystem to support other financial services such as remittances, airtime top-ups, savings, and bill payments. These services will be especially helpful in including displaced populations in development.

GSMA (2017) Refugees and Identity: Considerations for mobile-enabled registration and aid delivery. London: GSMA.

  • This paper emphasizes the importance of registration in the context of humanitarian emergency, because being registered and having a document that proves this registration is key in acquiring services and assistance.
  • Studying cases of Kenya and Iraq, the report concludes by providing three recommendations to improve mobile data collection and registration processes: 1) establish more flexible KYC for mobile money because where refugees are not able to meet existing requirements; 2) encourage interoperability and data sharing to avoid fragmented and duplicative registration management; and 3) build partnership and collaboration among governments, humanitarian organizations, and multinational corporations.

Jacobsen, Katja Lindskov (2015) “Experimentation in Humanitarian Locations: UNHCR and Biometric Registration of Afghan Refugees.” Security Dialogue, Vol 46 No. 2: 144–164.

  • In this article, Jacobsen studies the biometric registration of Afghan refugees, and considers how “humanitarian refugee biometrics produces digital refugees at risk of exposure to new forms of intrusion and insecurity.”

Jacobsen, Katja Lindskov (2017) “On Humanitarian Refugee Biometrics and New Forms of Intervention.” Journal of Intervention and Statebuilding, 1–23.

  • This article traces the evolution of the use of biometrics at the Office of the United Nations High Commissioner for Refugees (UNHCR) – moving from a few early pilot projects (in the early-to-mid-2000s) to the emergence of a policy in which biometric registration is considered a ‘strategic decision’.

Manby, Bronwen (2016) “Identification in the Context of Forced Displacement.” Washington DC: World Bank Group. Accessed August 21, 2017.

  • In this paper, Bronwen describes the consequences of not having an identity in a situation of forced displacement. It prevents displaced population from getting various services and creates higher chance of exploitation. It also lowers the effectiveness of humanitarian actions, as lacking identity prevents humanitarian organizations from delivering their services to the displaced populations.
  • Lack of identity can be both the consequence and and cause of forced displacement. People who have no identity can be considered illegal and risk being deported. At the same time, conflicts that lead to displacement can also result in loss of ID during travel.
  • The paper identifies different stakeholders and their interest in the case of identity and forced displacement, and finds that the biggest challenge for providing identity to refugees is the politics of identification and nationality.
  • Manby concludes that in order to address this challenge, there needs to be more effective coordination among governments, international organizations, and the private sector to come up with an alternative of providing identification and services to the displaced persons. She also argues that it is essential to ensure that national identification becomes a universal practice for states.

McClure, D. and Menchi, B. (2015). Challenges and the State of Play of Interoperability in Cash Transfer Programming. Geneva: UNHCR/World Vision International.

  • This report reviews the elements that contribute to the interoperability design for Cash Transfer Programming (CTP). The design framework offered here maps out these various features and also looks at the state of the problem and the state of play through a variety of use cases.
  • The study considers the current state of play and provides insights about the ways to address the multi-dimensionality of interoperability measures in increasingly complex ecosystems.     

NRC / International Human Rights Clinic (2016). Securing Status: Syrian refugees and the documentation of legal status, identity, and family relationships in Jordan.

  • This report examines Syrian refugees’ attempts to obtain identity cards and other forms of legally recognized documentation (mainly, Ministry of Interior Service Cards, or “new MoI cards”) in Jordan through the state’s Urban Verification Exercise (“UVE”). These MoI cards are significant because they allow Syrians to live outside of refugee camps and move freely about Jordan.
  • The text reviews the acquirement processes and the subsequent challenges and consequences that refugees face when unable to obtain documentation. Refugees can encounter issues ranging from lack of access to basic services to arrest, detention, forced relocation to camps and refoulement.  
  • Seventy-two Syrian refugee families in Jordan were interviewed in 2016 for this report and their experiences with obtaining MoI cards varied widely.

Office of Internal Oversight Services (2015). Audit of the operations in Jordan for the Office of the United Nations High Commissioner for Refugees. Report 2015/049. New York: UN.

  • This report documents the January 1, 2012 – March 31, 2014 audit of Jordanian operations, which is intended to ensure the effectiveness of the UNHCR Representation in the state.
  • The main goals of the Regional Response Plan for Syrian refugees included relieving the pressure on Jordanian services and resources while still maintaining protection for refugees.
  • The audit results concluded that the Representation was initially unsatisfactory, and the OIOS suggested several recommendations according to the two key controls which the Representation acknowledged. Those recommendations included:
    • Project management:
      • Providing training to staff involved in financial verification of partners supervise management
      • Revising standard operating procedure on cash based interventions
      • Establishing ways to ensure that appropriate criteria for payment of all types of costs to partners’ staff are included in partnership agreements
    • Regulatory framework:
      • Preparing annual need-based procurement plan and establishing adequate management oversight processes
      • Creating procedures for the assessment of renovation work in progress and issuing written change orders
      • Protecting data and ensuring timely consultation with the UNHCR Division of Financial and Administrative Management

UNHCR/WFP (2015). Joint Inspection of the Biometrics Identification System for Food Distribution in Kenya. Geneva: UNHCR/WFP.

  • This report outlines the partnership between the WFP and UNHCR in its effort to promote its biometric identification checking system to support food distribution in the Dadaab and Kakuma refugee camps in Kenya.
  • Both entities conducted a joint inspection mission in March 2015 and was considered an effective tool and a model for other country operations.
  • Still, 11 recommendations are proposed and responded to in this text to further improve the efficiency of the biometric system, including real-time evaluation of impact, need for automatic alerts, documentation of best practices, among others.