Selected Readings on the LGTBQ+ Community and Data


By Uma Kalkar, Salwa Mansuri, Marine Ragnet and Andrew J. Zahuranec

As part of an ongoing effort to contribute to current topics in data, technology, and governance, The GovLab’s Selected Readings series provides an annotated and curated collection of recommended works on themes such as open data, data collaboration, and civic technology.

Around the world, LGBTQ+ people face exclusion and discrimination that undermines their capacity to live their lives and succeed. Together with allies, many LGBTQ+ people are fighting to exercise their rights and achieve full equality. However, this struggle has been undermined by a lack of specific, quantifiable information on the challenges they face.

When collected and managed responsibly, data about sexual and gender minorities can be used to protect and empower LGBTQ+ people through informed policy and advocacy work. To this end, this Selected Reading investigates what data is (and is not) collected about LGBTQ+ individuals in the areas within healthcare, education, economics, and public policy and the ramifications of these outcomes. It offers a perspective on some of the existing gaps regarding LGBTQ+ data collection. It also examines the various challenges that LGBTQ+ groups have had to overcome through a data lens. While activism and advocacy has increased the visibility and acceptance of sexual and gender minorities and allowed them to better exercise their rights in society, significant inequities remain. Our literature review puts forward some of these recent efforts.

Most of the papers included in this review, however, conclude with similar findings: data for about LGBTQ+ communities is still lacking and as a result, research on the topic is often times also lagging behind. This is particularly problematic, as detailed in some of our readings, because LGBTQ+ populations are often at the center of discrimination and still face disparate health vulnerabilities. The LGBTQI+ Data Inclusion Act, which recently passed the US House of Representatives and would require over 100 federal agencies to improve data collection and surveying of LGBTQ communities, seeks to address this gap.

We hope this selection of readings can provide some clarity on current data-driven research for and about LGBTQ+ individuals. The readings are presented in alphabetical order.

***

Selected Reading List (in alphabetical order)

***

Annotated Selected Reading List (in alphabetical order):

D’Ignazio, Catherine, and Lauren F. Klein. Data Feminism. MIT Press, 2020. https://mitpress.mit.edu/books/data-feminism.

  • D’Ignazio and Klein investigate how data has been historically used to maintain specific social status quos. To overcome this challenge, they approach data collection and uses through an intersectional, feminist lens that identifies issues in current data handling systems and looks toward solutions for more inclusive data applications.
  • The editors define data feminism as “power, about who has it and who doesn’t, and about how those differentials of power can be challenged and changed using data.” The book centers around seven principles that identify and challenge existing power structures around data and seek pluralist, context-based data processes that illuminate hidden and missed data.

Giblon, Rachel, and Greta R. Bauer. “Health care availability, quality, and unmet need: a comparison of transgender and cisgender residents of Ontario, Canada.” BMC Health Services Research 17, no. 1 (2017): 1–10. https://bmchealthservres.biomedcentral.com/articles/10.1186/s12913-017-2226-z.

  • Canada boasts a universal healthcare and insurance system, yet disparities exist between the treatment quality, services, and knowledge about transgender patients.
  • Data collection on transgender, non-binary, and intersex individuals is not conducted in Canadian health surveys, making it difficult to compare and contrast the healthcare provided to transgender people with that provided to cisgender people. Moreover, a lack of physician knowledge about trans needs and/or refusal to provide hormone therapy/ gender-affirming procedures result in trans individuals explicitly avoiding medical services. The lack of services, comfort, and data about transgender people in Canada demonstrate their severely “unmet health care need.”
  • Using data about Ontario residents from the Canadian Community Health Survey and the Trans PULSE survey, the researchers find that 33% transgender Ontarians had an unmet health need that would not be unmet if they were cisgender. As well, transgender men and women found the quality of healthcare in their community to be poor than compared to cisgender individuals. Twenty-one percent of transgender people avoided going to emergency rooms because of their gender identity.

Bowleg, Lisa, and Stewart Landers. “The need for COVID-19 LGBTQ-specific data.” American Journal of Public Health 111, no. 9 (2021): 1604–1605. https://pubmed.ncbi.nlm.nih.gov/34436923/.

  • The adage “no data, no problem” has been magnified during the pandemic, highlighting gaps around data collection for LGBTQ communities, which often intersect with other communities who are disproportionately at-risk for COVID-19, such as minority populations in the service industry and those who smoke.
  • Despite concerns about the stigma facing LBGTQ communities, data collection from these demographics has been relatively feasible, with federal governments drastically increasing their data collection from LGBTQ communities.
  • However, the lack of direction and guidance at a federal level to collect sexual and gender minority data has stunted information about how this demographic has experienced COVID-19 when compared to cis-gender, heterosexual groups. The authors stress the need for data collection from LGBTQ communities and advocacy to encourage these practices to help address the pandemic.

Marshall, Zack, Vivian Welch, Alexa Minichiello, Michelle Swab, Fern Brunger, and Chris Kaposy. “Documenting research with transgender, nonbinary, and other gender diverse (trans) individuals and communities: introducing the global trans research evidence map.” Transgender Health 4, no. 1 (2019): 68–80. https://www.liebertpub.com/doi/10.1089/trgh.2018.0020.

  • Marshall and colleagues study a series of 15 academic databases to assemble a dataset describing 690 trans-focused articles. They then map where and how transgender “have been studied and represented within and across multiple fields of research” to understand the landscape of existing research on transgender people. They find that research around the trans community focused on physical and mental healthcare services and marginalization and were primarily observational research.
  • The authors found that social determinants of health for transgender people were the least studied, along with ethnicity, culture, and race, violence, early life experiences, activism, and education.
  • With this evidence map, researchers have a strong starting point to further explore issues through a LGBTQ lens and better engage with trans people and perspectives when looking at social problems.

Medina, Caroline and Lindsay Mahowald. “Collecting Data about LGBTQI+ and Other Sexual and Gender-Diverse Communities.” Center for American Progress, May 26, 2022. https://www.americanprogress.org/article/collecting-data-about-lgbtqi-and-other-sexual-and-gender-diverse-communities.

  • The paper argues, that despite advances “a persistent lack of routine data collection on sexual orientation, gender identity, and variations in sex characteristics (SOGISC) is still a substantial roadblock for policymakers, researchers, service providers, and advocates seeking to improve the health and well-being of LGBTQI+ people.”
  • Even though various types of data are integral to the experiences of LGBTQI+ people, the report narrows its focus to data collection in two forms of environments: general population surveys & surveys regarding LGBTQI+ people. Specific population surveys such as the latter provide significant advantage to capture specific and sensitive data.
  • It argues that a range of precautions can be adopted from a research design perspective to ensure that personal data and information is handled with care and matches ethical standards as outlined in the Data Ethics Framework of the Federal Data Strategy ranging from privacy and confidentiality to honesty and transparency.

Miner, Michael H., Walter O. Bockting, Rebecca Swinburne Romine, and Sivakumaran Raman. “Conducting internet research with the transgender population: Reaching broad samples and collecting valid data.” Social science computer review 30, no. 2 (2012): 202–211. https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3769415/.

  • The internet has the potential to collect information from transgender people, who are “a hard-to-reach, relatively small, and geographically dispersed population” in a diverse and representative manner.
  • To study HIV risk behaviors of transgender individuals in the U.S., Miner et al. developed an online tool that recruited individuals who frequent websites that are important for the transgender community and used quantiative and qualitative methods to learn more about these individuals. They conclude that while online data collection can be difficult to ensure internal validity, careful testing and methods can overcome these issues to improve data quality on transgender people.

Pega, Frank, Sari L. Reisner, Randall L. Sell, and Jaimie F. Veale. “Transgender health: New Zealand’s innovative statistical standard for gender identity.” American journal of public health 107, no. 2 (2017): 217–221. https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5227923/.

  • Pega et al. discuss New Zealand’s national statistical standard for gender identity data collection, the first of its kind. More governments in Australia and the United States are now following suit to address the health access and information disparity that transgender people face.
  • Data about transgender people has advanced progressive policy action in New Zealand, and the authors celebrate this statistical standard as a way to collect high quality data for data-driven policies to support these groups.
  • While this move will help uncover LGBTQ individuals currently hidden in data, the authors critique the standard because it does not “promote the two-question method, risking misclassification and undercounts; does promote the use of the ambiguous response category “gender diverse” in standard questions; and is not intersex inclusive.”

Ruberg, Bonnie, and Spencer Ruelos. “Data for Queer Lives: How LGBTQ Gender and Sexuality Identities Challenge Norms of Demographics.” Big Data & Society 7, no. 1 (June 18, 2020): 205395172093328. https://journals.sagepub.com/doi/full/10.1177/2053951720933286.

  • Drawing from the responses of 178 people who identified as non-heterosexual or non-cisgender in a survey, this paper argues that “dominant notions of demographic data, […] that seeks to accurately categorize and “capture” identity do not sufficiently account for the complexities of LGBTQ lives.”
  • Demographic data commonly imagines identity as fixed, singular, and discrete. However, the researchers’ findings suggest that, for LGBTQ people, gender and sexual identities are often multiple and in flux. Most respondents reported their understanding of their identity shifting over time. For many, “gender identity was made up of overlapping factors, including the relationship between gender and transgender identities. These findings challenge researchers to reconsider how identity is understood as and through data.” They argue that considering identities as fixed and discrete are not only exclusionary but also do not wholly represent the dynamic and fluid nature of gender identities.
  • The piece offers several recommendations to address this challenge. Firstly, the researchers argue to remove data discreteness, which will enable users to select multiple identities rather than choose one from a drop-down list. Secondly, create communication and feedback channels for LGBTQ+ to express whether surveys and other data collection methods are sufficiently inclusive and gender-sensitive.

Sell, Randall L. “LGBTQ health surveillance: data = power.” American Journal of Public Health 107, no. 6 (2017): 843–844. https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5425894/.

  • Sell recounts his motto: ‘data = power;’ ‘silence = death’ and how LGBTQ people have been victims of this situation. He argues that health research and surveillance has systemically ignored sexual and gender minorities, leading to gaps in administrative understanding and policies for LGBTQ population.
  • He laments that very few surveys on American health collect sexual and gender orientation data, and the lack of standardization around this data collection muddies researchers’ ability to collate and utilize the information meaningfully.
  • He calls for legislation that mandates the National Institutes of Health to include sexual and gender minorities in all publicly funded research similar to the specific inclusion requirement of women and racial and ethnic minorities in studies. Despite concerns about surveillance and targeting of LGBTQ minorities, Sell argues that data collection is imperative now for a long-scale understanding of the needs of the community, transcending political terms.

Snapp, Shannon D., Stephen T. Russell, Mariella Arredondo, and Russell Skiba. “A right to disclose: LGBTQ youth representation in data, science, and policy.” Advances in child development and behavior 50 (2016): 135–159. https://pubmed.ncbi.nlm.nih.gov/26956072/.

  • Despite significant and positive reforms such as the legalization of same-sex marriages and protection from intersectional sexual harrasment (Webb, 2011) in the United States, there is a striking gap in literature on evidence-based practices that support LGBTQ+ Youth (Kosciw & Pizmony-Levy, 2013). The lack of data-driven solutions stifle the creation of inclusive environments where members of the LGBTQI+ community feel heard and seen. There is a striking gap in literature on evidence-based practices that support LGBTQ+ Youth (also see Kosciw & Pizmony-Levy, 2013Mustanski, 2011).
  • At present federal and local state data-states do not include SOGI (Sexual Oreintation & Gender Identity) in demographic questions. Data sets that do have spaces to disclose SOGI are largely in a health-related setting such as the Centre for Disease Control or Youth Risk Behavior. As such learning and education disparities and outcomes are not accurately measured.
  • Missing systematic SOGI data renders members of the LGBTQ+ community invisible and sidelined. As such several members of civil society have therefore demanded for the need to gather SOGI data in the Department of Health, Education & Justice. Such data is therefore central to holistically encapsulate the discriminatory experiencees LGBTQ+ Youth face in an education setting, integral to well-being and development. Scholars and research teams have thusfar overcome the barriers of data reliability and validity (see Ridolfo, Miller, & Maitland, 2012) by collating the most effective methods for data collection (Sexual Minority Assessment Research Team, 2009).

Wimberly, George L. “Chapter 10: Use of large-scale data sets and LGBTQ education.” LGBTQ issues in education: Advancing a research agenda (2015): 175–218. https://ebooks.aera.net/LGBTQCH10.

  • This book chapter highlighs the importance of large-scale data sets to gain understanding about LGBTQ students, school experiences, and academic achievement.
  • Young people who identify as LGBTQ tend to be generalized and ways that LGBTQ identification questions are asked by surveys change across years, making it important to disaggregate large-scale data for more granular knowledge about LGBTQ people in education.
  • Wimberly provides information about multiple datasets that collect this information, how they ask questions on LGBTQ identity, and ways in which the datasets have been used or have the potential to be leveraged for a more comprehensive understanding of students. He also points out the limitations of existing data sets, namely that they tend to be retrospective of the LGBTQ adolescent experience and collected from convenience samples, such as college students. This limitation also impacts the external validity of the data, especially with regard to rural, racialized, and lower-income LGBTQ students.

Selected Readings on Digital Self-Determination for Migrants


By Uma Kalkar, Marine Ragnet, and Stefaan Verhulst

Digital self-determination (DSD) is a multidisciplinary concept that extends self-determination to the digital sphere. Self-determination places humans (and their ability to make ‘moral’ decisions) at the center of decision-making actions. While self-determination is considered as a jus cogens rule (i.e. a global norm), the concept of digital self-determination came only to light in the early 2010s as a result of the increasing digitization of most aspects of society. 

While digitalization has opened up new opportunities for self-expression and communication for individuals across the globe, its reach and benefits have not been evenly distributed. For instance, migrants and refugees are particularly vulnerable to the deepening inequalities and power structures brought on by increased digitization, and the subsequent datafication. Further, non-traditional data, such as social media and telecom data, have brought great potential to improve our understanding of the migration experience and patterns of mobility that can provide more targeted migration policies and services yet it also has brought new concerns related to the lack of agency to determine how the data is being used and who determines the migration narrative.

These selected readings look at DSD in light of the growing ubiquity of technology applications and specifically focus on their impacts on migrants. They were produced to inform the first studio on DSD and migration co-hosted by the Big Data for Migration Alliance and the International Digital Self Determination Network. The readings are listed in alphabetical order.

These readings serve as a primer to offer base perspectives on DSD and its manifestations, as well as provide a better understanding of how migration data is managed today to advance or hinder life for those on the move. Please alert us of any other publication we should include moving forward.

Berens, Jos, Nataniel Raymond, Gideon Shimshon, Stefaan Verhulst, and Lucy Bernholz. “The Humanitarian Data Ecosystem: the Case for Collective Responsibility.” Stanford Center for Philanthropy and Civil Society, 2017.

  • The authors explore the challenges to, and potential solutions for, the responsible use of digital data in the context of international humanitarian action. Data governance is related to DSD because it oversees how the information extracted from an individual—understood by DSD as an extension of oneself in the digital sphere—is handled.
  • They argue that in the digital age, the basic service provision activities of NGOs and aid organizations have become data collection processes. However, the ecosystem of actors is “uncoordinated” creating inefficiencies and vulnerabilities in the humanitarian space.
  • The paper presents a new framework for responsible data use in the humanitarian domain. The authors advocate for data users to follow three steps: 
  1. “[L]ook beyond the role they take up in the ‘data-lifecycle’ and consider previous and following steps and roles;
  2. Develop sound data responsibility strategies not only to prevent harm to their own operations but also to other organizations in the ‘data-lifecycle;’ and, 
  3. Collaborate with and learn from other organizations, both in the humanitarian field and beyond, to establish broadly supported guidelines and standards for humanitarian data use.”

Currion, Paul. “The Refugee Identity.Caribou Digital (via Medium), March 13, 2018.

  • Developed as part of a DFID-funded initiative, this essay outlines the Data Requirements for Service Delivery within Refugee Camps project that investigated current data standards and design of refugee identity systems.
  • Currion finds that since “the digitisation of aid has already begun…aid agencies must therefore pay more attention to the way in which identity systems affect the lives and livelihoods of the forcibly displaced, both positively and negatively.” He argues that an interoperable digital identity for refugees is essential to access financial, social, and material resources while on the move but also to tap into IoT services.
  • However, many refugees are wary of digital tracking and data collection services that could further marginalize them as they search for safety. At present, there are no sector-level data standards around refugee identity data collection, combination, and centralization. How can regulators balance data protection with government and NGO requirements to serve refugees in the ways they want to uphold their DSD?
  • Currion argues that a Responsible Data approach, as opposed to a process defined by a Data Minimization principle, provides “useful guidelines” but notes that data responsibility “still needs to be translated into organizational policy, then into institutional processes, and finally into operational practice. He further adds that “the digitization of aid, if approached from a position that empowers the individual as much as the institution, offers a chance to give refugees back their voices.”

Decker, Rianne, Paul Koot, S. Ilker Birbil, Mark van Embden Andres. “Co-designing algorithms for governance: Ensuring responsible and accountable algorithmic management of refugee camp supplies” Big Data and Society, April 2022. 

  • While recent literature has looked at the negative impacts of big data and algorithms in public governance, claiming they may reinforce existing biases and defy scrutiny by public officials, this paper argues that designing algorithms with relevant government and society stakeholders might be a way to make them more accountable and transparent. 
  • It presents a case study of the development of an algorithmic tool to estimate the populations of refugee camps to manage the delivery of emergency supplies. The algorithms included in this tool were co-designed with relevant stakeholders. 
  • This may provide a way to uphold DSD by  contributing to the “accountability of the algorithm by making the estimations transparent and explicable to its users.”
  • The authors found that the co-design process enabled better accuracy and responsibility and fostered collaboration between partners, creating a suitable purpose for the tool and making the algorithm understandable to its users. This enabled algorithmic accountability. 
  • The authors note, however, that the beneficiaries of the tools were not included in the design process, limiting the legitimacy of the initiative. 

European Migration Network. “The Use of Digitalisation and Artificial Intelligence in Migration Management.” EMN-OECD Inform Series, February 2022.

  • This paper explores the role of new digital technologies in the management of migration and asylum, focusing specifically on where digital technologies, such as online portals, blockchain, and AI-powered speech and facial recognition systems are being used across Europe to navigate the processes of obtaining visas, claiming asylum, gaining citizenship,  and deploying border control management. 
  • Further, it points to friction between GDPR and new technologies like blockchain—which by decision does not allow for the right to be forgotten—and potential workarounds, such as two-step pseudonymisation.
  • As well, it highlights steps taken to oversee and open up data protection processes for immigration. Austria, Belgium, and France have begun to conduct Data Protection Impact Assessments; France has a portal that allows one to request the right to be forgotten; Ireland informs online service users on how data can be shared or used with third-party agencies; and Spain outlines which personal data are used in immigration as per the Registry Public Treatment Activities.
  • Lastly, the paper points out next steps for policy development that upholds DSD, including universal access and digital literacy, trust in digital systems, willingness for government digital transformations, and bias and risk reduction.

Martin, Aaron, Gargi Sharma, Siddharth Peter de Souza, Linnet Taylor, Boudewijn van Eerd, Sean Martin McDonald, Massimo Marelli, Margie Cheesman, Stephan Scheel, and Huub Dijstelbloem. “Digitisation and Sovereignty in Humanitarian Space: Technologies, Territories and Tensions.” Geopolitics (2022): 1-36.

  • This paper explores how digitisation and datafication are reshaping sovereign authority, power, and control in humanitarian spaces.
  • Building on the notion that technology is political, Martin et al. discuss three cases where digital tools powered by partnerships between international organizations and NGOs and private firms such as Palantir and Facebook have raised concerns for data to be “repurposed” to undermine national sovereignty and distort humanitarian aims with for-profit motivations.
  • The authors draw attention to how cyber dependencies threaten international humanitarian organizations’ purported digital sovereignty. They touch on the tensions between national and digital sovereignty and self-governance.
  • The paper further argues that the rise of digital technologies in the governance of international mobility and migration policies “has all kinds of humanitarian and security consequences,” including (but not limited to) surveillance, privacy infringement, profiling, selection, inclusion/exclusion, and access barriers. Specifically, Scheel introduces the notion of function creep—the use of digital data beyond initially defined purposes—and emphasizes its common use in the context of migration as part “of the modus operandi of sovereign power.”

McAuliffe, Marie, Jenna Blower, and Ana Beduschi. “Digitalization and Artificial Intelligence in Migration and Mobility: Transnational Implications of the COVID-19 Pandemic.” Societies 11, no. 135 (2021): 1-13.

  • This paper critically examines the implications of intensifying digitalization and AI for migration and mobility systems in a post- COVID transnational context. 
  • The authors first situate digitalization and AI in migration by analyzing its uptake throughout the Migration Cycle, i.e. to verify identities and visas, “enable “smart” border processing,” and understand travelers’ adherence to legal frameworks. It then evaluates the current challenges and opportunities to migrants and migration systems brought about by deepening digitalization due to COVID-19. For example, contact tracing, infection screening, and quarantining procedures generate increased data about an individual and are meant, by design, to track and trace people, which raises concerns about migrants’ safety, privacy, and autonomy.
  • This essay argues that recent changes show the need for further computational advances that incorporate human rights throughout the design and development stages, “to mitigate potential risks to migrants’ human rights.” AI is severely flawed when it comes to decision-making around minority groups because of biased training data and could further marginalize vulnerable populations and intrusive data collection for public health could erode the power of one’s universal right to privacy. Leaving migrants at the mercy of black-box AI systems fails to uphold their right to DSD because it forces them to relinquish their agency and power to an opaque system.

Ponzanesi, Sandra. “Migration and Mobility in a Digital Age: (Re)Mapping Connectivity and Belonging.” Television & New Media 20, no. 6 (2019): 547-557.

  • This article explores the role of new media technologies in rethinking the dynamics of migration and globalization by focusing on the role of migrant users as “connected” and active participants, as well as “screened” and subject to biometric datafication, visualization, and surveillance.
  • Elaborating on concepts such as “migration” and “mobility,” the article analyzes the paradoxes of intermittent connectivity and troubled belonging, which are seen as relational definitions that are always fluid, negotiable, and porous.
  • It states that a city’s digital infrastructures are “complex sociotechnical systems” that have a functional side related to access and connectivity and a performative side where people engage with technology. Digital access and action represent areas of individual and collective manifestations of DSD. For migrants, gaining digital access and skills and “enacting citizenship” are important for resettlement. Ponzanesi advocates for further research conducted both from the bottom-up that leans on migrant experiences with technology to resettle and remain in contact with their homeland and a top-down approach that looks at datafication, surveillance, digital/e-governance as a part of the larger technology application ecosystem to understand contemporary processes and problems of migration.

Remolina, Nydia, and Mark James Findlay. “The Paths to Digital Self-Determination — A Foundational Theoretical Framework.” SMU Centre for AI & Data Governance Research Paper No. 03 (2021): 1-34.

  • Remolina and Findlay stress that self-determination is the vehicle by which people “decide their own destiny in the international order.” Decision-making ability powers humans to be in control of their own lives and excited to pursue a set of actions. Collective action, or the ability to make decisions as a part of a group—be it based on ethnicity, nationality, shared viewpoints, etc.—further motivates oneself.
  • The authors discuss how the European Union and European Court of Human Rights’ “principle of subsidiarity” aligns with self-determination because it advocates for power to be placed at the lowest level possible to preserve bottom-up agency with a “reasonable level of efficiency.” In practice, the results of subsidiarity have been disappointing.
  • The paper provides examples of indigenous populations’ fight for self-determination, offline and online. Here, digital self-determination refers to the challenges indigenous peoples face in accessing growing government uses of technology for unlocking innovative solutions because of a lack of physical infrastructure due to structural and social inequities between settler and indigenous communities.
  • Understanding self-determination—and by extension, digital self-determination as a human right, the report investigates how autonomy, sovereignty, the legal definition of a ‘right,’ inclusion, agency, data governance, data ownership, data control, and data quality.
  • Lastly, the paper presents a foundational theoretical framework that goes beyond just protecting personal data and privacy. Understanding that DSD “cannot be detached from duties for responsible data use,” the authors present a collective and individual dimension to DSD. They extend the individual dimension of DSD to include both my data and data about me that can be used to influence a person’s actions through micro-targeting and nudge techniques. They update the collective dimension of DSD to include the views and influences of organizations, businesses, and communities online and call for a better way of visualizing the ‘social self’ and its control over data.

Ziebart, Astrid, and Jessica Bither. “AI, Digital Identities, Biometrics, Blockchain: A Primer on the Use of Technology in Migration Management.” Migration Strategy Group on International Cooperation and Development, June 2020.

  • Ziebart and Bither note the implications of increasingly sophisticated use of technology and data collection by governments with respect to their citizens. They note that migrants and refugees “often are exposed to particular vulnerabilities” during these processes and underscore the need to bring migrants into data gathering and use policy conversations.  
  • The authors discuss the promise of technology—i.e., to predict migration through AI-powered analyses, employ technologies to reduce friction in the asylum-seeking processes, and the power of digital identities for those on the move. However, they stress the need to combine these tools with informational self-determination that allows migrants to own and control what data they share and how and where the data are used.
  • The migration and refugee policy space faces issues of “tech evangelism,” where technologies are being employed just because they exist, rather than because they serve an actual policy need or provide an answer to a particular policy question. This supply-driven policy implementation signals the need for more migrant voices to inform policymakers on what tools are actually useful for the migratory experience. In order to advance the digital agency of migrants, the paper offers recommendations for some of the ethical challenges these technologies might pose and ultimately advocates for greater participation of migrants and refugees in devising technology-driven policy instruments for migration issues.

On-the-go interesting resources 

  • Empowering Digital Self-Determination, mediaX at Stanford University: This short video presents definitions of DSD, and digital personhood, identity, and privacy and an overview of their applications across ethics, law, and the private sector.
  • Digital Self-Determination — A Living Syllabus: This syllabus and assorted materials have been created and curated from the 2021 Research Sprint run by the Digital Asia Hub and Berkman Klein Center for Internet Society at Harvard University. It introduces learners to the fundamentals of DSD across a variety of industries to enrich understanding of its existing and potential applications.
  • Digital Self-Determination Wikipedia Page: This Wikipedia page was developed by the students who took part in the Berkman Klein Center research sprint on digital self-determination. It provides a comprehensive overview of DSD definitions and its key elements, which include human-centered design, robust privacy mandates and data governance, and control over data use to give data subjects the ability to choose how algorithms manipulate their data for autonomous decision-making.
  • Roger Dubach on Digital Self-Determination: This short video presents DSD in the public sector and the dangers of creating a ‘data-protected’ world, but rather on understanding how governments can efficiently use data and protect privacy. Note: this video is part of the Living Syllabus course materials (Digital Self-Determination/Module 1: Beginning Inquiries).

Selected Readings on the Use of Artificial Intelligence in the Public Sector


By Kateryna Gazaryan and Uma Kalkar

The Living Library’s Selected Readings series seeks to build a knowledge base on innovative approaches for improving the effectiveness and legitimacy of governance. This curated and annotated collection of recommended works focuses on algorithms and artificial intelligence in the public sector.

As Artificial Intelligence becomes more developed, governments have turned to it to improve the speed and quality of public sector service delivery, among other objectives. Below, we provide a selection of recent literature that examines how the public sector has adopted AI to serve constituents and solve public problems. While the use of AI in governments can cut down costs and administrative work, these technologies are often early in development and difficult for organizations to understand and control with potential harmful effects as a result. As such, this selected reading explores not only the use of artificial intelligence in governance but also its benefits, and its consequences.

Readings are listed in alphabetical order.

Berryhill, Jamie, Kévin Kok Heang, Rob Clogher, and Keegan McBride. “Hello, World: Artificial intelligence and its use in the public sector.OECD Working Papers on Public Governance no. 36 (2019): https://doi.org/10.1787/726fd39d-en.

This working paper emphasizes the importance of defining AI for the public sector and outlining use cases of AI within governments. It provides a map of 50 countries that have implemented or set in motion the development of AI strategies and highlights where and how these initiatives are cross-cutting, innovative, and dynamic. Additionally, the piece provides policy recommendations governments should consider when exploring public AI strategies to adopt holistic and humanistic approaches.

Kuziemski, Maciej, and Gianluca Misuraca. “AI Governance in the Public Sector: Three Tales from the Frontiers of Automated Decision-Making in Democratic Settings.” Telecommunications Policy 44, no. 6 (2020): 101976. 

Kuziemski and Misuraca explore how the use of artificial intelligence in the public sector can exacerbate existing power imbalances between the public and the government. They consider the European Union’s artificial intelligence “governance and regulatory frameworks” and compare these policies with those of Canada, Finland, and Poland. Drawing on previous scholarship, the authors outline the goals, drivers, barriers, and risks of incorporating artificial intelligence into public services and assess existing regulations against these factors. Ultimately, they find that the “current AI policy debate is heavily skewed towards voluntary standards and self-governance” while minimizing the influence of power dynamics between governments and constituents. 

Misuraca, Gianluca, and Colin van Noordt. “AI Watch, Artificial Intelligence in Public Services: Overview of the Use and Impact of AI in Public Services in the EU.” 30255 (2020).

This study provides “evidence-based scientific support” for the European Commission as it navigates AI regulation via an overview of ways in which European Union member-states use AI to enhance their public sector operations. While AI has the potential to positively disrupt existing policies and functionalities, this report finds gaps in how AI gets applied by governments. It suggests the need for further research centered on the humanistic, ethical, and social ramification of AI use and a rigorous risk assessment from a “public-value perspective” when implementing AI technologies. Additionally, efforts must be made to empower all European countries to adopt responsible and coherent AI policies and techniques.

Saldanha, Douglas Morgan Fullin, and Marcela Barbosa da Silva. “Transparency and Accountability of Government Algorithms: The Case of the Brazilian Electronic Voting System.” Cadernos EBAPE.BR 18 (2020): 697–712.

Saldanha and da Silva note that open data and open government revolutions have increased citizen demand for algorithmic transparency. Algorithms are increasingly used by governments to speed up processes and reduce costs, but their black-box  systems and lack of explanability allows them to insert implicit and explicit bias and discrimination into their calculations. The authors conduct a qualitative study of the “practices and characteristics of the transparency and accountability” in the Brazilian e-voting system across seven dimensions: consciousness; access and reparations; accountability; explanation; data origin, privacy and justice; auditing; and validation, precision and tests. They find the Brazilian e-voting system fulfilled the need to inform citizens about the benefits and consequences of data collection and algorithm use but severely lacked in demonstrating accountability and opening algorithm processes for citizen oversight. They put forth policy recommendations to increase the e-voting system’s accountability to Brazilians and strengthen auditing and oversight processes to reduce the current distrust in the system.

Sharma, Gagan Deep, Anshita Yadav, and Ritika Chopra. “Artificial intelligence and effective governance: A review, critique and research agenda.Sustainable Futures 2 (2020): 100004.

This paper conducts a systematic review of the literature of how AI is used across different branches of government, specifically, healthcare, information, communication, and technology, environment, transportation, policy making, and economic sectors. Across the 74 papers surveyed, the authors find a gap in the research on selecting and implementing AI technologies, as well as their monitoring and evaluation. They call on future research to assess the impact of AI pre- and post-adoption in governance, along with the risks and challenges associated with the technology.

Tallerås, Kim, Terje Colbjørnsen, Knut Oterholm, and Håkon Larsen. “Cultural Policies, Social Missions, Algorithms and Discretion: What Should Public Service Institutions Recommend?Part of the Lecture Notes in Computer Science book series (2020).

Tallerås et al. examine how the use of algorithms by public services, such as public radio and libraries, influence broader society and culture. For instance, to modernize their offerings, Norway’s broadcasting corporation (NRK) has adopted online platforms similar to popular private streaming services. However, NRK’s filtering process has faced “exposure diversity” problems that narrow recommendations to already popular entertainment and move Norway’s cultural offerings towards a singularity. As a public institution, NRK is required to “fulfill […] some cultural policy goals,” raising the question of how public media services can remain relevant in the era of algorithms fed by “individualized digital culture.” Efforts are currently underway to employ recommendation systems that balance cultural diversity with personalized content relevance that engage individuals and uphold the socio-cultural mission of public media.

Vogl, Thomas, Seidelin Cathrine, Bharath Ganesh, and Jonathan Bright. “Smart Technology and the Emergence of Algorithmic Bureaucracy: Artificial Intelligence in UK Local Authorities.” Public administration review 80, no. 6 (2020): 946–961.

Local governments are using “smart technologies” to create more efficient and effective public service delivery. These tools are twofold: not only do they help the public interact with local authorities, they also streamline the tasks of government officials. To better understand the digitization of local government, the authors conducted surveys, desk research, and in-depth interviews with stakeholders from local British governments to understand reasoning, processes, and experiences within a changing government framework. Vogl et al. found an increase in “algorithmic bureaucracy” at the local level to reduce administrative tasks for government employees, generate feedback loops, and use data to enhance services. While the shift toward digital local government demonstrates initiatives to utilize emerging technology for public good, further research is required to determine which demographics are not involved in the design and implementation of smart technology services and how to identify and include these audiences.

Wirtz, Bernd W., Jan C. Weyerer, and Carolin Geyer. “Artificial intelligence and the public sector—Applications and challenges.International Journal of Public Administration 42, no. 7 (2019): 596-615.

The authors provide an extensive review of the existing literature on AI uses and challenges in the public sector to identify the gaps in current applications. The developing nature of AI in public service has led to differing definitions of what constitutes AI and what are the risks and benefits it poses to the public. As well, the authors note the lack of focus on the downfalls of AI in governance, with studies tending to primarily focus on the positive aspects of the technology. From this qualitative analysis, the researchers highlight ten AI applications: knowledge management, process automation, virtual agents, predictive analytics and data visualization, identity analytics, autonomous systems, recommendation systems, digital assistants, speech analytics, and threat intelligence. As well, they note four challenge dimensions—technology implementation, laws and regulation, ethics, and society. From these applications and risks, Wirtz et al. provide a “checklist for public managers” to make informed decisions on how to integrate AI into their operations. 

Wirtz, Bernd W., Jan C. Weyerer, and Benjamin J. Sturm. “The dark sides of artificial intelligence: An integrated AI governance framework for public administration.International Journal of Public Administration 43, no. 9 (2020): 818-829.

As AI is increasingly popularized and picked up by governments, Wirtz et al. highlight the lack of research on the challenges and risks—specifically, privacy and security—associated with implementing AI systems in the public sector. After assessing existing literature and uncovering gaps in the main governance frameworks, the authors outline the three areas of challenges of public AI: law and regulations, society, and ethics. Last, they propose an “integrated AI governance framework” that takes into account the risks of AI for a more holistic “big picture” approach to AI in the public sector.

Zuiderwijk, Anneke, Yu-Che Chen, and Fadi Salem. “Implications of the use of artificial intelligence in public governance: A systematic literature review and a research agenda.Government Information Quarterly (2021): 101577.

Following a literature review on the risks and possibilities of AI in the public sector, Zuiderwijk, Chen, and Salem design a research agenda centered around the “implications of the use of AI for public governance.” The authors provide eight process recommendations, including: avoiding superficial buzzwords in research; conducting domain- and locality-specific research on AI in governance; shifting from qualitative analysis to diverse research methods; applying private sector “practice-driven research” to public sector study; furthering quantitative research on AI use by governments; creating “explanatory research designs”; sharing data for broader study; and adopting multidisciplinary reference theories. Further, they note the need for scholarship to delve into best practices, risk management, stakeholder communication, multisector use, and impact assessments of AI in the public sector to help decision-makers make informed decisions on the introduction, implementation, and oversight of AI in the public sector.

Updated and Expanded Selected Readings on Indigenous Data Sovereignty


By Juliet McMurren, Uma Kalkar, Yuki Mitsuda, and Andrew Zahuranec

Updated on October 11, 2021

As part of an ongoing effort to build a knowledge base for the field of improving governance through data and technology, The GovLab publishes a series of Selected Readings, which provide an annotated and curated collection of recommended works on themes such as open data, data collaboration, and civic technology.

In this edition, to recognize and honor Indigenous Peoples’ Day, we have updated our previous curation of literature on Indigenous data sovereignty (IDS)—the principle that Indigenous peoples should be able to control the data collected by and about them, to determine how and by whom it is accessed, stored, and used. These pieces discuss data practices and methodologies that reflect Indigenous peoples’ lived experiences, cultures, and worldviews. 

To suggest additional readings on this or any other topic, please email info@thelivinglib.org. All our Selected Readings can be found here.

Readings are listed in alphabetical order. New additions are highlighted in green.

Selected Readings (in alphabetical order)

Principles

Kukutai, Tahu and John Taylor (eds) Indigenous Data Sovereignty: Towards an Agenda (2016)

  • The foundational work in the field, this edited volume brings together Māori, Australian Aboriginal, Native American, and First Nations academics, researchers and data practitioners to set out the case for Indigenous data sovereignty.
  • Organized in four parts, the book begins by providing a historical account of colonialist statistics and the origins of the concept of Indigenous data sovereignty. In the second part, the authors set out an Indigenous critique of official statistics as a colonialist practice primarily intended to serve settler governments through the control of Indigenous peoples. As a result, population statistics from these societies are imbued with colonialist norms that both ignore indicators significant to Indigenous peoples and reduce them to what contributor Maggie Walter calls 5D data: disparity, deprivation, disadvantage, dysfunction, and difference.
  • The authors outline how Indigenous data sovereignty would work, setting out an agenda in which Indigenous people would control who should be counted among them, and establish collection priorities reflective of their cultural norms, interests, values and priorities. This could include a move away from data about individuals as single indicators used to compare, rank and drive “improvement” to a more nuanced and complex view of data that focuses on social groupings beyond the household. They would also control who would have access to the data gathered, with culturally appropriate rules and protocols for consents to access and use data. These principles are encapsulated in the First Nations OCAP® data model, through which they assert their ownership, control over collection, use and disclosure; access to, and possession of all First Nations’ data.
  • The third part of the book provides examples of Indigenous data sovereignty in practice, from the perspective of both data practitioners and data users. A case study of data sovereignty among the Yawuru of Western Australia outlines a methodology for developing data collection rooted in self-determination and community values of mabu buru (knowledge of the land) and mabu liyan (relational or community wellbeing). Another examines the work of a Māori primary health care organization, National Hauora Coalition, which conducted a rapid response campaign to reduce high rates of acute rheumatic fever among Māori children in Auckland. Stewardship and analysis of their own community data enabled targeted interventions that reduced positive Group A strep rates among children by 75 percent and rates of rheumatic fever by 33 percent.
  • The final section of the book outlines the emerging efforts of the New Zealand and Australian Government to engage with Indigenous peoples’ desire for data sovereignty through their statistical practices.

Lovett, Raymond et al Good Data Practices for Indigenous Data Sovereignty and Governance (2019)

  • This multi-authored chapter is the first in a volume the editors describe as born of frustration with dystopian “bad data” practices and devoted to the exploration of how data could be used “productively and justly to further social, economic, cultural and political goals.”
  • The chapter sets out the context for the emergence of IDS movements worldwide, and gives a survey of IDS networks and their foundational principles, such as OCAP® (above) and the Māori principles of rangatiratanga(right to own, access, control and possess), manaakitanga (ethical use to further wellbeing) and kaitiakitanga (sustainable stewardship) as they apply to data about or from themselves and their environs.
  • The article defines and differentiates IDS — the management of information in alignment with the laws, practices and customs of the nation-state in which it is located — and indigenous data governance (IDG), or power and authority over the design, ownership, access to and use of data. It situates IDS movements alongside broader movements for Indigenous sovereignty informed by the rights laid out in the UN Declaration on the Rights of Indigenous Peoples.

Rainie, Stephanie Carroll, Tahu Kukutai, Maggie Walter, Oscar Figueroa-Rodriguez, Jennifer Walker, and Per Axelsson (2019) Issues in Open Data — Indigenous Data Sovereignty. In T. Davies, S. Walker, M. Rubinstein, & F. Perini (Eds.), The State of Open Data: Histories and Horizons.

  • A chapter in State of Open Data: Histories and Horizons, a book that seeks to take stock of progress made toward open data across sectors, introduces the concept of Indigenous Data Sovereignty and describes how open data is a source of tension for Indigenous peoples. Reviewing the history and current usage of Indigenous data around the world, the chapter notes how Indigenous Data Sovereignty raises “fundamental questions about assumptions of ownership, representation, and control in open data communities” and how it challenges the open data movement’s approach to data ownership, licensing, and data use. It also notes how Indigenous nations are political entities and the ways that multi-layered governance challenges the “open data binary with one government actor, the nation-state.”
  • The authors observe that there is a widespread lack of understanding about IDS within the open data movement, and that open data policy and discussions have been largely framed to address the needs and interests of nation-states, with minimal engagement with Indigenous peoples. The authors provide a critique of the ways in which the Open Data Charter overlooks the issue of IDS in its principles on open by default, citizen engagement, and inclusive development. They note that the Open Data Charter’s commitment to free use, reuse, and redistribution by anyone, at any time, and anywhere, for example, is in direct conflict with the rights of Indigenous Peoples to govern their own data and control how and by whom it is accessed.
  • Opening state data that is unreliable, inaccurate, and designed, collected and processed according to the norms of state agencies poses additional problems for Indigenous peoples. Statistics about Indigenous peoples based on colonialist norms frequently perpetuate a narrative of inequality. Data infrastructures may be distorted by cultural assumptions, such as those about naming conventions, that misrepresent Indigenous people. In addition, the concept of open data has led to instances of cooptation and theft of Indigenous knowledge, when researchers have collected Indigenous knowledge about the environment, digitized it and shared it with Indigenous consent or oversight.
  • Drawing on the experience of Indigenous data networks worldwide, the authors propose three steps forward for the open data community in its relationship to Indigenous peoples. First, it needs to engage with Indigenous peoples as partners and knowledge holders to inform stewardship of Indigenous data. Secondly, IDS networks, with contacts in Indigenous communities and the world of data, should act as intermediaries for this engagement. Finally, they call for a broader adoption of principles on the governance and stewardship of Indigenous data within research and administration.

Research Data Alliance CARE Principles for Indigenous Data Governance (2019)

  • The RDA’s CARE principles propose an additional set of criteria that should be applied to open data in order to ensure that it respects Indigenous rights to self-determination. It argues the existing FAIR principles — that open data should be findable, accessible, interoperable and reusable — focus on data characteristics that facilitate increased sharing while ignoring historical context and power differentials.
  • To supplement FAIR, they propose the addition of CARE: that open data should be for collective benefit, recognize Indigenous peoples’ authority to control their own data, carry a responsibility to demonstrate how they benefit self-determination, and have embedded ethics prioritizing the rights and wellbeing of Indigenous people.
  • The principle of collective benefit asserts that data ecosystems should be designed in ways that Indigenous peoples can derive benefit from them. This includes active government support for Indigenous data use and reuse, using data to reduce information asymmetries between government and Indigenous communities, and the use of any value created from Indigenous data to benefit Indigenous communities. Authority to control recognizes the rights and interests of Indigenous peoples in their knowledge and data, and to govern and control how it is collected, accessed, used and stored.
  • Those working with Indigenous data have a responsibility to demonstrate how they are using it to benefit Indigenous communities. This involves fostering relationships of partnership and trust, working to build capability and capacity within Indigenous communities, and grounding data in the experiences, languages and worldviews of those communities. Finally, the rights and wellbeing of Indigenous peoples must be the primary concern. This requires data design and collection practices that do not stigmatize Indigenous people and that align with Indigenous ethical practices, that address imbalances in power and resources, and that are mindful of the potential for future use and potential harms.

Walter, Maggie, Raymond Lovett, Bobby Maher, Bhiamie Williamson, Jacob Prehn, Gawaian Bodkin‐Andrews, and Vanessa Lee. “Indigenous data sovereignty in the era of big data and open data.” Australian Journal of Social Issues 56, no. 2 (2021): 143-156.

  • A new book edited by Maggie Walter, Tahu Kukutai, Stephanie Russo Carroll, and Desi Rodriguez-Lonebear “examines how Indigenous Peoples around the world are demanding greater data sovereignty, and challenging the ways in which governments have historically used Indigenous data to develop policies and programs.” Through 15 articles, the book explores challenges and opportunities facing Indigenous peoples in places such as Aotearoa New Zealand, the Basque Country, and North and South America. These pieces explore various policy issues and methodological approaches from the perspective of Indigenous peoples to support positive social change.

Applications and Case Studies

Carroll, Stephanie Russo, Desi Rodriguez-Lonebear and Andrew Martinez, Indigenous Data Governance: Strategies from United States Native Nations (2019)

  • This article reviews IDS strategies from Native nations in the United States, connecting IDS and IDG to the rebuilding of Native nations and providing case studies of IDG occurring within tribal and non-tribal entities.
  • The article leads with a definition of key terms, including data dependency, “a paradox of scarcity and abundance: extensive data are collected about Indigenous peoples and nations, but rarely by or for Indigenous peoples’ and nations’ purposes.” It proposes IDG as a method by which the aspiration of IDS can be achieved, through a self-reinforcing cycle: governance of data leads to data rebuilding, providing data for governance that in turn leads to nation rebuilding.
  • The article offers three tribal, two non-tribal, and three urban, inter- and supra-tribal case studies of IDG in practice. The National Congress of American Indians Tribal Data Capacity Project, for example, was a pilot project to build tribal data capacity with five US tribes. Its outputs included a successful census conducted by the Pueblo of Laguna and the University of New Mexico, on tribal terms with tribal money for tribal purposes, and resulting in the development of proprietary software that remains the property of the tribe and that can be reused for subsequent collections.
  • The article concludes with a set of recommendations for tribal rights holders and stakeholders. It recommends that tribal rights holders develop tribe-specific data governance principles, policies, and procedures, and generate resources for IDG. Stakeholders are called on to acknowledge, support and promote IDS and embed it in data collection practices by building frameworks specifying how IDS is to be enacted in data processes, investing in intertribal institutions and recruiting and training Indigenous data professionals, among other measures.

Chaney, Christopher Data Sovereignty and the Tribal Law and Order Act (2018)

  • This article surveys the relationship between data sovereignty and the provision of criminal justice services, a key aspect of tribal sovereignty. The Tribal Law and Order Act (TLOA) 2010 addressed tribal data by mandating federal justice and law enforcement agencies to coordinate and consult with tribes over data collection, and providing tribal criminal justice agencies meeting federal and state requirements with access to national crime databases to enter and retrieve data.
  • TLOA has resulted in broad and extensive opportunities for federally recognized tribes to submit and retrieve data. Subject to federal law, tribes have the right to determine what information they will submit and access, putting the tribe in control of its own data. It also greatly facilitates the administration of tribal law enforcement and justice by enabling access to federal databases on property, such as vehicles and firearms, and people, including those on fugitives, sex offenders, and missing persons. The author suggests that TLOA implementation could serve as a model for other federal agencies working towards tribal data sovereignty arrangements.

First Nations Information Governance Centre (FNIGC) First Nations’ Data Sovereignty in Canada (2019).

  • This paper provides an overview of First Nations experiences of Canadian efforts to identify First Nations individuals, communities, and Nations in official statistics and data, and of the development of First Nations Data Sovereignty efforts over the previous two decades.
  • The paper surveys the ways in which early legislation constructed “Indians” and indian status within Canada counter to First Nations norms, harming traditional gender roles, leadership structures and governance, severing many First Nations women who “married out” from the culture and lands, and forcing First Nations people to choose between “enfranchisement” through education or employment and their Indian status and culture.
  • The paper then surveys the current First Nations statistical context, noting its numerous deficiencies. Sources of information and data, including the national census, were created with little or no Indigenous involvement or input, creating inconsistencies in the accuracy, reliability, usefulness, and comparability of the data. Even where the data is useful, it is not routinely used in planning and advocacy for the benefit of First Nations communities. First Nations are also required to meet onerous reporting requirements in order to access federal funding, but the resulting data — and other data from and about First Nations — are not effectively analyzed, used, or shared with First Nations.
  • The paper provides examples of effective instances of national and regional First Nations data sovereignty using OCAP® principles. These include the First Nations Information Governance Centre’s own survey work on health, childhood, education, labor and employment, but also similar provincial initiatives. The FNIGC is currently at work with regional partners to develop a National Data Governance Strategy to advance First Nations Data Sovereignty.

Garrison, Nanibaa’ et al Genomic Research Through An Indigenous Lens: Understanding the Expectations (2019)

  • This multi-authored study compares research guidelines for genomic research among Indigenous peoples in Canada, New Zealand, Australia, and the United States.
  • It notes that while there is a dearth of genomic research about Indigenous peoples, Indigenous communities have been the subject of western science in ways that have been intrusive, disrespectful and unethical, leading to community harms and mistrust. Lack of community engagement and informed consent for secondary use of data, and past experiences of harmful and negative representation in publications, have reduced the willingness of Indigenous peoples to engage with genetic research.
  • Canada, New Zealand, Australia, and the United States each have guidelines on scientific research among Indigenous peoples. The authors compare the provisions of these guidelines and the Indigenous Research Protection Act, a draft instrument developed by the Indigenous Peoples Council on Biocolonialism with the goal of protecting Indigenous peoples in research, across four principles: community engagement, rights and interests, institutional responsibilities, and ethical/legal oversight. They observe that while many of the policies provide for protection of Indigenous peoples relating to sample collection, secondary uses of data, benefits, and withdrawal from research, there is less consistency regarding cultural rights and interests, particularly in US instruments.
  • The authors examine ways Indigenous peoples have sought to “bridge the gap” between the benefits of genomic research and the protection of Indigenous peoples. Community protocol development, Indigenous-led genomics initiatives, and consent procedures that draw on UNDRIP have increased community engagement in some countries and fostered greater trust. Concrete progress has also been made in initiatives to preserve Indigenous rights and interests over biospecimens, including protocols that allow for the return of samples, biobanking, and Indigenous governance of resulting data.

Gifford, Heather and Kirikowhai Mikaere Te Kete Tū Ātea: Towards claiming Rangitīkei iwi data sovereignty (2019)

  • This article gives an outline of the Te Kete Tū Ātea research project, an four-year, two phase participatory research initiative by the Rangitīkei Iwi Collective to establish iwi data sovereignty. The first phase resulted in the development of an iwi data needs analysis and comprehensive iwi information framework, which identified potential data sources, gaps in current information, and strategies to address those gaps. The second phase led to the prioritization of a key information gathering domain, economic data, and a statistical evaluation of current iwi data holdings. The project adopted a Kaupapa Māori approach: it was “Māori led, Māori controlled, privileged a Māori worldview, and was framed around questions identified by Māori as of relevance to Māori.”
  • The first phase of the study identified a five domain framework to guide iwi data gathering. Collectively, these five domains — cultural, social, peoples, environmental and economic — make up Te Kete Tū Ātea, informed by three goal dimensions: kaitiakitanga, strengthening identity and connection, and empowerment and enablement.
  • The study identified challenges in assessing the wellbeing of iwi, including statistical capacity within iwi and the availability of data, but the authors suggest that the approach itself could be borrowed and applied by other iwi nationwide.

Goodluck, Kalen. “Why the U.S. is terrible at collecting Indigenous data.” High Country News, December 14, 2021. 

  • In this article, Kalen Goodluck interviews Abigail Echo-Hawk, the Chief Research Officer for the Seattle Indian Health Board and Director of the Urban Indigenous Health Institute, about the lack of Indigenous representation in health data collection and analysis. Missing and nonstandardized racial and ethnic information has “effectively “invisibilized”” Indigenous people, creating systemic issues for indigenous health and compounding the vulnerability of these groups during the COVID-19 pandemic. Moreover, despite indigenous legal rights to access federal data about indigenous health, communities and researchers face obstacles in accessing this information. Ultimately, Echo-Hawk calls for accountability mechanisms to leverage the new administration and open data for community-centered work.

Hasan, Najmul, Yukun Bao, and Shah J. Miah. “Exploring the impact of ICT usage among indigenous people and their quality of life: operationalizing Sen’s capability approach.” Information Technology for Development (2021): 1-21.

  • Najmul Hasan, Yukun Bao, and Shah J. Miah’s article in Information Technology for Development studies how the digital divide, the gap between those who benefit from access to digital technology and those who do not, affects indigenous people in Bangladesh. Using a structured questionnaire administered to 250 individuals, the researchers try to determine how Indigenous peoples’ freedom influence use of information communication technologies (ICTs) and whether there are inter-relationships among these factors and the use of information communication technology. The researchers find ICTs have a significant impact on Indigenous peoples in Bangladesh. Further they note among other findings, that the results “suggest development paths for indigenous society, providing political freedom to individuals as a combination of specific factors considered in our study. These include creating awareness about public decision making, focusing empowerment, local and national voting systems.”

Johnson-Jennings, Michelle, Derek Jennings, and Meg Little Indigenous data sovereignty in action: The Food Wisdom Repository (2019)

  • This article arose from the experience of the authors at the Research for Indigenous Community Health (RICH) Center. Observing that while Indigenous health and nutrition information is available, it is dispersed and difficult to access, they proposed the development of a Food Wisdom Repository to gather meaningful data and information on Indigenous health practices and efforts. The result, supported by the Shakopee Mdewakanton Sioux Community, is an online digital repository of wise food practices grounded in Indigenous knowledge and IDS.
  • The project draws on Indigenous worldviews, knowledge and ways of knowing, beliefs, and forms of power. In particular, it is framed around the idea of wise practices — pragmatic, flexible and sustainable practices rooted in a given local context and the wisdom of community members — rather than the objective, hierarchical, hegemonic and acontextual “best practices” of Western science.

Montenegro, Maria Subverting the universality of metadata standards: The TK labels as a tool to promote Indigenous data sovereignty (2019)

  • This paper explores how metadata standards, and in particular the widely used Dublin Core, reinforce colonial legal property frameworks and disenfranchise Indigenous people, and how they could be used (or subverted) to exercise and promote IDS.
  • The author notes that the rights and creator fields of DC are in direct conflict with Indigenous epistemologies and protocols on the access, circulation, and use of traditional Indigenous knowledge (TK). The rights field is embedded in western legal practice designed to recognize and protect new creations or inventions, and require a designated individual author and original work in order to offer any protection. This emphasis on originality and individuality is at odds with Indigenous knowledge that emphasizes collective and cumulative knowledge acquired over generations. Similarly, both western IP law and the creator field within DC recognize the individual who records the lifestyles, languages and cultural practices of Indigenous people in film, audio, or image as the legal author, rather than the communities from which the content arose. As subjects but not authors, Indigenous people have no control over these recordings of their cultural practices, or how they are stored, accessed or reused. Indeed, they are even legally required to seek permission from the author to reuse these materials that document their lives and culture.
  • Developed in collaboration with Indigenous peoples, the TK labels are a set of digital tags that can be included as associated metadata in various digital information contexts such as CMSs, online catalogs and databases, finding aids and online platforms. These tags are intended to increase awareness of culturally appropriate circulation, access and use of Indigenous cultural materials. Designed to be used where communities are unable to assert legal control over materials, they provide important information about culturally appropriate use and stewardship. A Seasonal tag developed by the Penobscot Nation, for example, proscribes access to some content outside a given time of year, while an Attribution label, the most widely used, allows Indigenous communities to assert that they are the TK holders of the content and should be acknowledged as such.
  • While the TK labels represent a welcome advance in capturing and asserting Indigenous metadata standards, they are voluntary, and therefore only function if non-tribal collecting institutions recognize the IDS of the tribes.

McMahon, Rob, Tim LaHache, and Tim Whiteduck Digital Data Management as Indigenous Resurgence in Kahnawà:ke (2015)

  • This article documents IDG experiences within the Kahnawà:ke Mohawk (Quebec) community as it set up and used ICT systems to manage community data on research, education, finance, health, membership, housing, lands, and resources. Their research followed the implementation of a customized digital data management system, and sought to find out employees of community service organizations, chiefly in education, conceived of and used data, and the role of data management as part of self-government and Indigenous resurgence.
  • The authors describe the initiative as an act of “everyday community resurgence,” but one that was accompanied by significant internal tensions and challenges. They note the need to avoid technological determinism in IDS, since the use of ICTs has the potential to exacerbate the effects of settler colonialism, concentrating and centralizing power.
  • The article describes the rollout, architecture, and governance of the Kahnawà:ke data management system. One of the challenges faced by the community was in data sharing, with a lack of trust between community organizations leading to data hugging and silos. This tension, which has been identified in other research cited by the authors, points to the need for trust-building in order to promote more holistic data sharing and optimal data use.

Oguamanam, Chidi. “Indigenous Peoples, Data Sovereignty, and Self-Determination: Current Realities and Imperatives.” The African Journal of Information and Communication (AJIC), 26, 1-20. https://doi.org/10.23962/10539/30360

  • This paper by University of Ottawa Professor and Centre for International Governance Innovation Senior Fellow Chidi Oguamanam describes the current state of the global Indigenous data sovereignty movement. Describing the conceptual and practical context, the response by the Government of Canada and most members of Canada’s First Nations, and a variety of other responses, Oguamanam describes how the movement relates to larger efforts to promote Indigenous self-determination, finding “fundamental tension between the objectives of Indigenous data sovereignty and those of the open data movement, which does not directly cater for Indigenous peoples’ full control over their data.”

Rainie, Stephanie Carroll et al Data as a Strategic Resource: Self-determination, Governance, and the Data Challenge for Indigenous Nations in the United States (2017)

  • Despite the need of Indigenous nations for data to help identify problems and find solutions, US Indigenous nations encounter a data landscape characterized by “sparse, inconsistent, and irrelevant information complicated by limited access and utility” that does not serve to address tribally defined needs. Because much of this data is collected and controlled by others for their own purposes, mistrust in data collection is high.
  • This article documents two cases studies in tribal data sovereignty and data governance, among the Ysleta del Sur Pueblo and Cheyenne River Sioux Tribe. It lays out the data priorities, agendas and challenges faced by each, and the resulting data initiatives, protocols and uses. The article also discusses how this data governance contributed to the tribes’ self determination.
  • As part of a development strategy, in 2008 the Ysleta del Sur began to collect socioeconomic and demographic data annually from its citizens as part of its enrolment process. Implementing a census approach that incorporated cultural and local knowledge and western epistemologies, the project yielded data about population, poverty rates, household incomes, educational attainment, workforce and unemployment that was more complete than US census data. Strong community engagement yielded a 90 percent response rate, and the results inspired other data initiatives to support community strategic decision making. The socioeconomic data was also used to support successful applications for federal funding.
  • Identifying high levels of poverty and unemployment as a problem, the Cheyenne River Sioux Tribe sought a comprehensive plan to address these problems, for which they needed timelier, more granular, and more culturally and locally relevant data than that available through the federal government. With academic partners, it developed a survey and data collection process to collect baseline demographic and socioeconomic data from a sample of residents. The survey was able to quantify unemployment rates among people living on the reservation, but also captured employment categories missed by federal data collection, such as the arts microenterprise sector. The results were shared back to the community, and used to foster microenterprises and write grant applications.

Walter, Maggie and Michelle Suina Indigenous data, indigenous methodologies and indigenous data sovereignty (2019)

  • In this article, Walter and Suina propose that there is a dearth of Indigenous quantitative methodologies, driven by a longstanding mistrust of positivist research that positions Indigenous peoples within a deficit discourse. What the authors call “quantitative avoidance” leads to lived consequences for Indigenous peoples: since the statistics produced by quantitative methods form the primary evidence base for policy within the colonial societies, failing to engage with them removes Indigenous people from a critical part of the policy debate.
  • The authors make a case for developing Indigenous quantitative methodologies. They evidence the Albuquerque Area Southwest Tribal Epidemiology Center, whose mission is to collaborate with the 27 tribes of their health area to provide high quality, culturally congruent epidemiology, capacity development, program evaluation, and health promotion. Committed to honoring tribal sovereignty, AASTEC is committed to building capacity that enables tribes to control data design, collection, and management at all stages of the process. This requires not merely adapting western survey instruments, but redesigning them to incorporate the values and definitions of health of the communities they serve.
  • The authors close with three recommendations for communities and stakeholders interested in building Indigenous quantitative methodologies. First, communities need to cultivate technical skills for survey development, data collection, analysis and reporting. Secondly, they need to build comfort and understand about research methods among tribal partners in order to undo decades of mistrust; the authors describe simulation exercises that help to demonstrate how worldviews shape expectations and perceptions around data that they have used successfully with Indigenous and non-Indigenous participants. Finally, they should pursue advocacy of IDS and an exchange of ideas that allows successful Indigenous research methodologies to be promulgated.

Selected Readings on Data Portability


By Juliet McMurren, Andrew Young, and Stefaan G. Verhulst

As part of an ongoing effort to build a knowledge base for the field of improving governance through technology, The GovLab publishes a series of Selected Readings, which provide an annotated and curated collection of recommended works on themes such as open data, data collaboration, and civic technology.

In this edition, we explore selected literature on data portability.

To suggest additional readings on this or any other topic, please email info@thelivinglib.org. All our Selected Readings can be found here.

Context

Data today exists largely in silos, generating problems and inefficiencies for the individual, business and society at large. These include:

  • difficulty switching (data) between competitive service providers;
  • delays in sharing data for important societal research initiatives;
  • barriers for data innovators to reuse data that could generate insights to inform individuals’ decision making; and
  • inhibitions to scale data donation.

Data portability — the principle that individuals have a right to obtain, copy, and reuse their own personal data and to transfer it from one IT platform or service to another for their own purposes — is positioned as a solution to these problems. When fully implemented, it would make data liquid, giving individuals the ability to access their own data in a usable and transferable format, transfer it from one service provider to another, or donate data for research and enhanced data analysis by those working in the public interest.

Some companies, including Google, Apple, Twitter and Facebook, have sought to advance data portability through initiatives like the Data Transfer Project, an open source software project designed to facilitate data transmittals. Newly enacted data protection legislation such as Europe’s General Data Protection Regulation (2018) and the California Consumer Privacy Act (2018) give data holders a right to data portability. However, despite the legal and technical advances made, many questions toward scaling up data liquidity and portability responsibly and systematically remain. These new data rights have generated complex and as yet unanswered questions about the limits of data ownership, the implications for privacy, security and intellectual property rights, and the practicalities of how, when, and to whom data can be transferred.

In this edition of the GovLab’s Selected Readings series, we examine the emerging literature on data portability to provide a foundation for future work on the value proposition of data portability. Readings are listed in alphabetical order.

Selected readings

Cho, Daegon, Pedro Ferreira, and Rahul Telang, The Impact of Mobile Number Portability on Price and Consumer Welfare (2016)

  • In this paper, the authors analyze how Mobile Number Portability (MNP) — the ability for consumers to maintain their phone number when changing providers, thus reducing switching costs — affected the relationship between switching costs, market price and consumer surplus after it was introduced in most European countries in the early 2000s.
  • Theory holds that when switching costs are high, market leaders will enjoy a substantial advantage and are able to keep prices high. Policy makers will therefore attempt to decrease switching costs to intensify competition and reduce prices to consumers.
  • The study reviewed quarterly data from 47 wireless service providers in 15 EU countries between 1999 and 2006. The data showed that MNP simultaneously decreased market price by over four percent and increased consumer welfare by an average of at least €2.15 per person per quarter. This increase amounted to a total of €880 million per quarter across the 15 EU countries analyzed in this paper and accounted for 15 percent of the increase in consumer surplus observed over this time.

CtrlShift, Data Mobility: The data portability growth opportunity for the UK economy (2018)

  • Commissioned by the UK Department of Digital, Culture, Media and Sport (DCMS), this study was intended to identify the potential of personal data portability for the UK economy.
  • Its scope went beyond the legal right to data portability envisaged by the GDPR, to encompass the current state of personal data portability and mobility, requirements for safe and secure data sharing, and the potential economic benefits through stimulation of innovation, productivity and competition.
  • The report concludes that increased personal data mobility has the potential to be a vital stimulus for the development of the digital economy, driving growth by empowering individuals to make use of their own data and consent to others using it to create new data-driven services and technologies.
  • However, the report concludes that there are significant challenges to be overcome, and new risks to be addressed, before the value of personal data can be realized. Much personal data remains locked in organizational silos, and systemic issues related to data security and governance and the uneven sharing of benefits need to be resolved.

Data Guidance and Future of Privacy Forum, Comparing Privacy Laws: GDPR v. CCPA (2018)

  • This paper compares the provisions of the GDPR with those of the California Consumer Privacy Act (2018).
  • Both article 20 of the GDPR and section 1798 of the CCPA recognize a right to data portability. Both also confer on data subjects the right to receive data from controllers free of charge upon request, and oblige controllers to create mechanisms to provide subjects with their data in portable and reusable form so that it can be transmitted to third parties for reuse.
  • In the CCPA, the right to data portability is an extension of the right to access, and only confers on data subjects the right to apply for data collected within the past 12 months and have it delivered to them. The GDPR does not impose a time limit, and allows data to be transferred from one data controller to another, but limits the right to automatically collected personal data provided by the data subject themselves through consent or contract.

Data Transfer Project, Data Transfer Project Overview and Fundamentals (2018)

  • The paper presents an overview of the goals, principles, architecture, and system components of the Data Transfer Project. The intent of the DTP is to increase the number of services offering data portability and provide users with the ability to transfer data directly in and out of participating providers through systems that are easy and intuitive to use, private and secure, reciprocal between services, and focused on user data. The project, which is supported by Microsoft, Google, Twitter and Facebook, is an open-source initiative that encourages the participation of other providers to reduce the infrastructure burden on providers and users.
  • In addition to benefits to innovation, competition, and user choice, the authors point to benefits to security, through allowing users to backup, organize, or archive their data, recover from account hijacking, and retrieve their data from deprecated services.
  • The DTP’s remit was to test concepts and feasibility for the transfer of specific types of user data between online services using a system of adapters to transfer proprietary formats into canonical formats that can be used to transfer data while allowing providers to maintain control over the security of their service. While not resolving all formatting or support issues, this approach would allow substantial data portability and encourage ecosystem sustainability.

Deloitte, How to Flourish in an Uncertain Future: Open Banking(2017)

  • This report addresses the innovative and disruptive potential of open banking, in which data is shared between members of the banking ecosystem at the authorization of the customer, with the potential to increase competition and facilitate new products and services. In the resulting marketplace model, customers could use a single banking interface to access products from multiple players, from established banks to newcomers and fintechs.
  • The report’s authors identify significant threats to current banking models. Banks that failed to embrace open banking could be relegated to a secondary role as an infrastructure provider, while third parties — tech companies, fintech, and price comparison websites — take over the customer relationship.
  • The report identifies four overlapping operating models banks could adopt within an open banking model: full service providers, delivering proprietary products through their own interface with little or no third-party integration; utilities, which provide other players with infrastructure without customer-facing services; suppliers, which offer proprietary products through third-party interfaces; and interfaces,which provide distribution services through a marketplace interface. To retain market share, incumbents are likely to need to adopt a combination of these roles, offering their own products and services and those of third parties through their own and others’ interfaces.

Digital Competition Expert Panel Unlocking Digital Competition(2019)

  • This report captures the findings of the UK Digital Competition Expert Panel, which was tasked in 2018 with considering opportunities and challenges the digital economy might pose for competition and competition policy and to recommend any necessary changes. The panel focused on the impact of big players within the sector, appropriate responses to mergers or anticompetitive practices, and the impact on consumers.
  • The panel found that the digital economy is creating many benefits, but that digital markets are subject to tipping, in which emerging winners can scoop much of the market. This concentration can give rise to substantial costs, especially to consumers, and cannot be solved by competition alone. However, government policy and regulatory solutions have limitations, including the slowness of policy change, uneven enforcement and profound informational asymmetries between companies and government.
  • The panel proposed the creation of a digital markets unit that would be tasked with developing a code of competitive conduct, enabling greater personal data mobility and systems designed with open standards, and advancing access to non-personal data to reduce barriers to market entry.
  • The panel’s model of data mobility goes beyond data portability, which involves consumers being able to request and transfer their own data from one provider to another. Instead, the panel recommended empowering consumers to instigate transfers of data between a business and a third party in order to access price information, compare goods and services, or access tailored advice and recommendations. They point to open banking as an example of how this could function in practice.
  • It also proposed updating merger policy to make it more forward-looking to better protect consumers and innovation and preserve the competitiveness of the market. It recommended the creation of antitrust policy that would enable the implementation of interim measures to limit damage to competition while antitrust cases are in process.

Egan, Erin, Charting a Way Forward: Data Portability and Privacy(2019)

  • This white paper by Facebook’s VP and Chief Privacy Officer, Policy, represents an attempt to advance the conversation about the relationship between data portability, privacy, and data protection. The author sets out five key questions about data portability: what is it, whose and what data should be portable, how privacy should be protected in the context of portability, and where responsibility for data misuse or improper protection should lie.
  • The paper finds that definitions of data portability still remain imprecise, particularly with regard to the distinction between data portability and data transfer. In the interest of feasibility and a reasonable operational burden on providers, it proposes time limits on providers’ obligations to make observed data portable.
  • The paper concludes that there are strong arguments both for and against allowing users to port their social graph — the map of connections between that user and other users of the service — but that the key determinant should be a capacity to ensure the privacy of all users involved. Best-practice data portability protocols that would resolve current differences of approach as to what, how and by whom information should be made available would help promote broader portability, as would resolution of liability for misuse or data exposure.

Engels, Barbara, Data portability among online platforms (2016)

  • The article examines the effects on competition and innovation of data portability among online platforms such as search engines, online marketplaces, and social media, and how relations between users, data, and platform services change in an environment of data portability.
  • The paper finds that the benefits to competition and innovation of portability are greatest in two kinds of environments: first, where platforms offer complementary products and can realize synergistic benefits by sharing data; and secondly, where platforms offer substitute or rival products but the risk of anti-competitive behaviour is high, as for search engines.
  • It identifies privacy and security issues raised by data portability. Portability could, for example, allow an identity fraudster to misuse personal data across multiple platforms, compounding the harm they cause.
  • It also suggests that standards for data interoperability could act to reduce innovation in data technology, encouraging data controllers to continue to use outdated technologies in order to comply with inflexible, government-mandated standards.

Graef, Inge, Martin Husovec and Nadezhda Purtova, Data Portability and Data Control: Lessons for an Emerging Concept in EU Law (2018)

  • This paper situates the data portability right conferred by the GDPR within rights-based data protection law. The authors argue that the right to data portability should be seen as a new regulatory tool aimed at stimulating competition and innovation in data-driven markets.
  • The authors note the potential for conflicts between the right to data portability and the intellectual property rights of data holders, suggesting that the framers underestimated the potential impact of such conflicts on copyright, trade secrets and sui generis database law.
  • Given that the right to data portability is being replicated within consumer protection law and the regulation of non-personal data, the authors argue framers of these laws should consider the potential for conflict and the impact of such conflict on incentives to innovate.

Mohsen, Mona Omar and Hassan A. Aziz The Blue Button Project: Engaging Patients in Healthcare by a Click of a Button (2015)

  • This paper provides a literature review on the Blue Button initiative, an early data portability project which allows Americans to access, view or download their health records in a variety of formats.
  • Originally launched through the Department of Veterans’ Affairs in 2010, the Blue Button initiative had expanded to more than 500 organizations by 2014, when the Department of Health and Human Services launched the Blue Button Connector to facilitate both patient access and development of new tools.
  • The Blue Button has enabled the development of tools such as the Harvard-developed Growth-Tastic app, which allows parents to check their child’s growth by submitting their downloaded pediatric health data. Pharmacies across the US have also adopted the Blue Button to provide patients with access to their prescription history.

More than Data and Mission: Smart, Got Data? The Value of Energy Data to Customers (2016)

  • This report outlines the public value of the Green Button, a data protocol that provides customers with private and secure access to their energy use data collected by smart meters.
  • The authors outline how the use of the Green Button can help states meet their energy and climate goals by enabling them to structure renewables and other distributed energy resources (DER) such as energy efficiency, demand response, and solar photovoltaics. Access to granular, near real time data can encourage innovation among DER providers, facilitating the development of applications like “virtual energy audits” that identify efficiency opportunities, allowing customers to reduce costs through time-of-use pricing, and enabling the optimization of photovoltaic systems to meet peak demand.
  • Energy efficiency receives the greatest boost from initiatives like the Green Button, with studies showing energy savings of up to 18 percent when customers have access to their meter data. In addition to improving energy conservation, access to meter data could improve the efficiency of appliances by allowing devices to trigger sleep modes in response to data on usage or price. However, at the time of writing, problems with data portability and interoperability were preventing these benefits from being realized, at a cost of tens of millions of dollars.
  • The authors recommend that commissions require utilities to make usage data available to customers or authorized third parties in standardized formats as part of basic utility service, and tariff data to developers for use in smart appliances.

MyData, Understanding Data Operators (2020)

  • MyData is a global movement of data users, activists and developers with a common goal to empower individuals with their personal data to enable them and their communities to develop knowledge, make informed decisions and interact more consciously and efficiently.
  • This introductory paper presents the state of knowledge about data operators, trusted data intermediaries that provide infrastructure for human-centric personal data management and governance, including data sharing and transfer. The operator model allows data controllers to outsource issues of legal compliance with data portability requirements, while offering individual users a transparent and intuitive way to manage the data transfer process.
  • The paper examines use cases from 48 “proto-operators” from 15 countries who fulfil some of the functions of an operator, albeit at an early level of maturity. The paper finds that operators offer management of identity authentication, data transaction permissions, connections between services, value exchange, data model management, personal data transfer and storage, governance support, and logging and accountability. At the heart of these functions is the need for minimum standards of data interoperability.
  • The paper reviews governance frameworks from the general (legislative) to the specific (operators), and explores proto-operator business models. In keeping with an emerging field, business models are currently unclear and potentially unsustainable, and one of a number of areas, including interoperability requirements and governance frameworks, that must still be developed.

National Science and Technology Council Smart Disclosure and Consumer Decision Making: Report of the Task Force on Smart Disclosure (2013)

  • This report summarizes the work and findings of the 2011–2013 Task Force on Smart Disclosure: Information and Efficiency, an interagency body tasked with advancing smart disclosure, through which data is made more available and accessible to both consumers and innovators.
  • The Task Force recognized the capacity of smart disclosure to inform consumer choices, empower them through access to useful personal data, enable the creation of new tools, products and services, and promote efficiency and growth. It reviewed federal efforts to promote smart disclosure within sectors and in data types that crosscut sectors, such as location data, consumer feedback, enforcement and compliance data and unique identifiers. It also surveyed specific public-private partnerships on access to data, such as the Blue and Green Button and MyData initiatives in health, energy and education respectively.
  • The Task Force reviewed steps taken by the Federal Government to implement smart disclosure, including adoption of machine readable formats and standards for metadata, use of APIs, and making data available in an unstructured format rather than not releasing it at all. It also reviewed “choice engines” making use of the data to provide services to consumers across a range of sectors.
  • The Task Force recommended that smart disclosure should be a core component of efforts to institutionalize and operationalize open data practices, with agencies proactively identifying, tagging, and planning the release of candidate data. It also recommended that this be supported by a government-wide community of practice.

Nicholas, Gabriel Taking It With You: Platform Barriers to Entry and the Limits of Data Portability (2020)

  • This paper considers whether, as is often claimed, data portability offers a genuine solution to the lack of competition within the tech sector.
  • It concludes that current regulatory approaches to data portability, which focus on reducing switching costs through technical solutions such as one-off exports and API interoperability, are not sufficient to generate increased competition. This is because they fail to address other barriers to entry, including network effects, unique data access, and economies of scale.
  • The author proposes an alternative approach, which he terms collective portability, which would allow groups of users to coordinate the transfer of their data to a new platform. This model raises questions about how such collectives would make decisions regarding portability, but would enable new entrants to successfully target specific user groups and scale rapidly without having to reach users one by one.

OECD, Enhancing Access to and Sharing of Data: Reconciling Risks and Benefits for Data Re-use across Societies (2019)

  • This background paper to a 2017 expert workshop on risks and benefits of data reuse considers data portability as one strategy within a data openness continuum that also includes open data, market-based B2B contractual agreements, and restricted data-sharing agreements within research and data for social good applications.
  • It considers four rationales offered for data portability. These include empowering individuals towards the “informational self-determination” aspired to by GDPR, increased competition within digital and other markets through reductions in information asymmetries between individuals and providers, switching costs, and barriers to market entry; and facilitating increased data flows.
  • The report highlights the need for both syntactic and semantic interoperability standards to ensure data can be reused across systems, both of which may be fostered by increased rights to data portability. Data intermediaries have an important role to play in the development of these standards, through initiatives like the Data Transfer Project, a collaboration which brought together Facebook, Google, Microsoft, and Twitter to create an open-source data portability platform.

Personal Data Protection Commission Singapore Response to Feedback on the Public Consultation on Proposed Data Portability and Data Innovation Provisions (2020)

  • The report summarizes the findings of the 2019 PDPC public consultation on proposals to introduce provisions on data portability and data innovation in Singapore’s Personal Data Protection Act.
  • The proposed provision would oblige organizations to transmit an individual’s data to another organization in a commonly used machine-readable format, upon the individual’s request. The obligation does not extend to data intermediaries or organizations that do not have a presence in Singapore, although data holders may choose to honor those requests.
  • The obligation would apply to electronic data that is either provided by the individual or generated by the individual’s activities in using the organization’s service or product, but not derived data created by the processing of other data by the data holder. Respondents were concerned that including derived data could harm organizations’ competitiveness.
  • Respondents were concerned about how to honour data portability requests where the data of third parties was involved, as in the case of a joint account holder, for example. The PDPC opted for a “balanced, reasonable, and pragmatic approach,” allowing data involving third parties to be ported where it was under the requesting individual’s control, was to be used for domestic and personal purposes, and related only to the organization’s product or service.

Quinn, Paul Is the GDPR and Its Right to Data Portability a Major Enabler of Citizen Science? (2018)

  • This article explores the potential of data portability to advance citizen science by enabling participants to port their personal data from one research project to another. Citizen science — the collection and contribution of large amounts of data by private individuals for scientific research — has grown rapidly in response to the development of new digital means to capture, store, organize, analyze and share data.
  • The GDPR right to data portability aids citizen science by requiring transfer of data in machine-readable format and allowing data subjects to request its transfer to another data controller. This requirement of interoperability does not amount to compatibility, however, and data thus transferred would probably still require cleaning to be usable, acting as a disincentive to reuse.
  • The GDPR’s limitation of transferability to personal data provided by the data subject excludes some forms of data that might possess significant scientific potential, such as secondary personal data derived from further processing or analysis.
  • The GDPR right to data portability also potentially limits citizen science by restricting the grounds for processing data to which the right applies to data obtained through a subject’s express consent or through the performance of a contract. This limitation excludes other forms of data processing described in the GDPR, such as data processing for preventive or occupational medicine, scientific research, or archiving for reasons of public or scientific interest. It is also not clear whether the GDPR compels data controllers to transfer data outside the European Union.

Wong, Janis and Tristan Henderson, How Portable is Portable? Exercising the GDPR’s Right to Data Portability (2018)

  • This paper presents the results of 230 real-world requests for data portability in order to assess how — and how well — the GDPR right to data portability is being implemented. The authors were interested in establishing the kinds of file formats that were returned in response to requests, and to identify practical difficulties encountered in making and interpreting requests, over a three month period beginning on the day the GDPR came into effect.
  • The findings revealed continuing problems around ensuring portability for both data controllers and data subjects. Of the 230 requests, only 163 were successfully completed.
  • Data controllers frequently had difficulty understanding the requirements of GDPR, providing data in incomplete or inappropriate formats: only 40 percent of the files supplied were in a fully compliant format. Additionally, some data controllers were confused between the right to data portability and other rights conferred by the GDPR, such as the right to access or erasure.

Selected Readings on AI for Development


By Dominik Baumann, Jeremy Pesner, Alexandra Shaw, Stefaan Verhulst, Michelle Winowatan, Andrew Young, Andrew J. Zahuranec

As part of an ongoing effort to build a knowledge base for the field of improving governance through technology, The GovLab publishes a series of Selected Readings, which provide an annotated and curated collection of recommended works on themes such as open data, data collaboration, and civic technology. 

In this edition, we explore selected literature on AI and Development. This piece was developed in the context of The GovLab’s collaboration with Agence Française de Développement (AFD) on the use of emerging technology for development. To suggest additional readings on this or any other topic, please email info@thelivinglib.org. All our Selected Readings can be found here.

Context: In recent years, public discourse on artificial intelligence (AI) has focused on its potential for improving the way businesses, governments, and societies make (automated) decisions. Simultaneously, several AI initiatives have raised concerns about human rights, including the possibility of discrimination and privacy breaches. Between these two opposing perspectives is a discussion on how stakeholders can maximize the benefits of AI for society while minimizing the risks that might arise from the use of this technology.

While the majority of AI initiatives today come from the private sector, international development actors increasingly experiment with AI-enabled programs. These initiatives focus on, for example, climate modelling, urban mobility, and disease transmission. These early efforts demonstrate the promise of AI for supporting more efficient, targeted, and impactful development efforts. Yet, the intersection of AI and development remains nascent, and questions remain regarding how this emerging technology can deliver on its promise while mitigating risks to intended beneficiaries.

Readings are listed in alphabetical order.

2030Vision. AI and the Sustainable Development Goals: the State of Play

  • In broad language, this document for 2030Vision assesses AI research and initiatives and the Sustainable Development Goals (SDGs) to determine gaps and potential that can be further explored or scaled. 
  • It specifically reviews the current applications of AI in two SDG sectors, food/agriculture and healthcare.
  • The paper recommends enhancing multi-sector collaboration among businesses, governments, civil society, academia and others to ensure technology can best address the world’s most pressing challenges.

Andersen, Lindsey. Artificial Intelligence in International Development: Avoiding Ethical Pitfalls. Journal of Public & International Affairs (2019). 

  • Investigating the ethical implications of AI in the international development sector, the author argues that the involvement of many different stakeholders and AI-technology providers results in ethical issues concerning fairness and inclusion, transparency, explainability and accountability, data limitations, and privacy and security.
  • The author recommends the information communication technology for development (ICT4D) community adopt the Principles for Digital Development to ensure the ethical implementation of AI in international development projects.
  • The Principles of Digital Development include: 1) design with the user; 2) understand the ecosystem; 3) design for scale; 4) build for sustainability; 5) be data driven; 6) use open standards, open data, open source, and open innovation; and 7) reuse and improve.

Arun, Chinmayi. AI and the Global South: Designing for Other Worlds in Markus D. Dubber, Frank Pasquale, and Sunit Das (eds.), The Oxford Handbook of Ethics of AI, Oxford University Press, Forthcoming (2019).

  • This chapter interrogates the impact of AI’s application in the Global South and raises concerns about such initiatives.
  • Arun argues AI’s deployment in the Global South may result in discrimination, bias, oppression, exclusion, and bad design. She further argues it can be especially harmful to vulnerable communities in places that do not have strong respect for human rights.
  • The paper concludes by outlining the international human rights laws that can mitigate these risks. It stresses the importance of a human rights-centric, inclusive, empowering context-driven approach in the use of AI in the Global South.

Best, Michael. Artificial Intelligence (AI) for Development Series: Module on AI, Ethics and Society. International Telecommunications Union (2018). 

  • This working paper is intended to help ICT policymakers or regulators consider the ethical challenges that emerge within AI applications.
  • The author identifies a four-pronged framework of analysis (risks, rewards, connections, and key questions to consider) that can guide policymaking in the fields of: 1) livelihood and work; 2) diversity, non-discrimination and freedoms from bias; 3) data privacy and minimization; and 4) peace and security.
  • The paper also includes a table of policies and initiatives undertaken by national governments and tech companies around AI, along with the set of values (mentioned above) explicitly considered.

International Development Innovation Alliance (2019). Artificial Intelligence and International Development: An Introduction

  • Results for Development, a nonprofit organization working in the international development sector, developed a report in collaboration with the AI and Development Working Group within the International Development Innovation Alliance (IDIA). The report provides a brief overview of AI and how this technology may impact the international development sector.
  • The report provides examples of AI-powered applications and initiatives that support the SDGs, including eradicating hunger, promoting gender equality, and encouraging climate action.
  • It also provides a collection of supporting resources and case studies for development practitioners interested in using AI.

Paul, Amy, Craig Jolley, and Aubra Anthony. Reflecting the Past, Shaping the Future: Making AI Work for International Development. United States Agency for International Development (2018). 

  • This report outlines the potential of machine learning (ML) and artificial intelligence in supporting development strategy. It also details some of the common risks that can arise from the use of these technologies.
  • The document contains examples of ML and AI applications to support the development sector and recommends good practices in handling such technologies. 
  • It concludes by recommending broad, shared governance, using fair and balanced data, and ensuring local population and development practitioners remain involved in it.

Pincet, Arnaud, Shu Okabe, and Martin Pawelczyk. Linking Aid to the Sustainable Development Goals – a machine learning approach. OECD Development Co-operation Working Papers (2019). 

  • The authors apply ML and semantic analysis to data sourced from the OECD’s Creditor Reporting System to map aid funding to particular SDGs.
  • The researchers find “Good Health and Well-Being” as the most targeted SDG, what the researchers call the “SDG darling.”
  • The authors find that mapping relationships between the system and SDGs can help to ensure equitable funding across different goals.

Quinn, John, Vanessa Frias-Martinez, and Lakshminarayan Subramanian. Computational Sustainability and Artificial Intelligence in the Developing World. Association for the Advancement of Artificial Intelligence (2014). 

  • These researchers suggest three different areas—health, food security, and transportation—in which AI applications can uniquely benefit the developing world. The researchers argue the lack of technological infrastructure in these regions make AI especially useful and valuable, as it can efficiently analyze data and provide solutions.
  • It provides some examples of application within the three themes, including disease surveillance, identification of drought and agricultural trends, modeling of commuting patterns, and traffic congestion monitoring.

Smith, Matthew and Sujaya Neupane. Artificial intelligence and human development: toward a research agenda (2018).

  • The authors highlight potential beneficial applications for AI in a development context, including healthcare, agriculture, governance, education, and economic productivity.
  • They also discuss the risks and downsides of AI, which include the “black boxing” of algorithms, bias in decision making, potential for extreme surveillance, undermining democracy, potential for job and tax revenue loss, vulnerability to cybercrime, and unequal wealth gains towards the already-rich.
  • They recommend further research projects on these topics that are interdisciplinary, locally conducted, and designed to support practice and policy.

Tomašev, Nenad, et al. AI for social good: unlocking the opportunity for positive impact. Nature Communications (2020).

  • This paper takes stock of what the authors term the AI for Social Good movement (AI4SG), which “aims to establish interdisciplinary partnerships centred around AI applications towards SDGs.”  
  • Developed at a multidisciplinary expert seminar on the topic, the authors present 10 recommendations for creating successful AI4SG collaborations: “1) Expectations of what is possible with AI need to be well grounded. 2) There is value in simple solutions. 3) Applications of AI need to be inclusive and accessible, and reviewed at every stage for ethics and human rights compliance. 4) Goals and use cases should be clear and well-defined. 5) Deep, long-term partnerships are required to solve large problem successfully. 6) Planning needs to align incentives, and factor in the limitations of both communities. 7) Establishing and maintaining trust is key to overcoming organisational barriers. 8) Options for reducing the development cost of AI solutions should be explored. 9) Improving data readiness is key. 10) Data must be processed securely, with utmost respect for human rights and privacy.”

Vinuesa, Ricardo, et al. The role of artificial intelligence in achieving the Sustainable Development Goals. 

  • This report analyzes how AI can meet both the demands of some SDGs and also inhibit progress toward others. It highlights a critical research gap about the extent to which AI impacts sustainable development in the medium and long term. 
  • Through his analysis, Vinuesa claims AI has the potential to positively impact the environment, society, and the economy. However, AI can hinder these groups.
  • The authors recognize that although AI enables efficiency and productivity, it can also increase inequality and hinder achievements of the 2030 Agenda. Vinuesa and his co-authors suggest adequate policy formation and regulation are needed to ensure fast and equitable development of AI technologies that can address the SDGs. 

United Nations Education, Scientific and Cultural Organization (UNESCO) (2019). Artificial intelligence for Sustainable Development: Synthesis Report, Mobile Learning Week 2019

  • In this report, UNESCO assesses the findings from Mobile Learning Week (MLW) 2019. The three main conclusions were: 1) the world is facing a learning crisis; 2) education drives sustainable development; and 3) sustainable development can only be achieved if we harness the potential of AI. 
  • Questions around four major themes dominated the MLW 2019 sessions: 1) how to guarantee inclusive and equitable use of AI in education; 2) how to harness AI to improve learning; 3) how to increase skills development; and 4) how to ensure transparent and auditable use of education data. 
  • To move forward, UNESCO advocates for more international cooperation and stakeholder involvement, creation of education and AI standards, and development of national policies to address educational gaps and risks. 

The Data Storytelling Workbook


Book by Anna Feigenbaum and Aria Alamalhodaei: “From tracking down information to symbolising human experiences, this book is your guide to telling more effective, empathetic and evidence-based data stories.

Drawing on cross-disciplinary research and first-hand accounts of projects ranging from public health to housing justice, The Data Storytelling Workbook introduces key concepts, challenges and problem-solving strategies in the emerging field of data storytelling. Filled with practical exercises and activities, the workbook offers interactive training materials that can be used for teaching and professional development. By approaching both ‘data’ and ‘storytelling’ in a broad sense, the book combines theory and practice around real-world data storytelling scenarios, offering critical reflection alongside practical and creative solutions to challenges in the data storytelling process, from tracking down hard to find information, to the ethics of visualising difficult subjects like death and human rights….(More)”.

Big data in official statistics


Paper by Barteld Braaksma and Kees Zeelenberg: “In this paper, we describe and discuss opportunities for big data in official statistics. Big data come in high volume, high velocity and high variety. Their high volume may lead to better accuracy and more details, their high velocity may lead to more frequent and more timely statistical estimates, and their high variety may give opportunities for statistics in new areas. But there are also many challenges: there are uncontrolled changes in sources that threaten continuity and comparability, and data that refer only indirectly to phenomena of statistical interest.

Furthermore, big data may be highly volatile and selective: the coverage of the population to which they refer may change from day to day, leading to inexplicable jumps in time-series. And very often, the individual observations in these big data sets lack variables that allow them to be linked to other datasets or population frames. This severely limits the possibilities for correction of selectivity and volatility. Also, with the advance of big data and open data, there is much more scope for disclosure of individual data, and this poses new problems for statistical institutes. So, big data may be regarded as so-called nonprobability samples. The use of such sources in official statistics requires other approaches than the traditional one based on surveys and censuses.

A first approach is to accept the big data just for what they are: an imperfect, yet very timely, indicator of developments in society. In a sense, this is what national statistical institutes (NSIs) often do: we collect data that have been assembled by the respondents and the reason why, and even just the fact that they have been assembled is very much the same reason why they are interesting for society and thus for an NSI to collect. In short, we might argue: these data exist and that’s why they are interesting.

A second approach is to use formal models and extract information from these data. In recent years, many new methods for dealing with big data have been developed by mathematical and applied statisticians. New methods like machine-learning techniques can be considered alongside more traditional methods like Bayesian techniques. National statistical institutes have always been reluctant to use models, apart from specific cases like small-area estimates. Based on experience at Statistics Netherlands, we argue that NSIs should not be afraid to use models, provided that their use is documented and made transparent to users. On the other hand, in official statistics, models should not be used for all kinds of purposes….(More)”.

Index: Secondary Uses of Personal Data


By Alexandra Shaw, Andrew Zahuranec, Andrew Young, Stefaan Verhulst

The Living Library Index–inspired by the Harper’s Index–provides important statistics and highlights global trends in governance innovation. This installment focuses on public perceptions regarding secondary uses of personal data (or the re-use of data initially collected for a different purpose). It provides a summary of societal perspectives toward personal data usage, sharing, and control. It is not meant to be comprehensive–rather, it intends to illustrate conflicting, and often confusing, attitudes toward the re-use of personal data. 

Please share any additional, illustrative statistics on data, or other issues at the nexus of technology and governance, with us at info@thelivinglib.org

Data ownership and control 

  • Percentage of Americans who say it is “very important” they control information collected about them: 74% – 2016
  • Americans who think that today’s privacy laws are not good enough at protecting people’s privacy online: 68% – 2016
  • Americans who say they have “a lot” of control over how companies collect and use their information: 9% – 2015
  • In a survey of 507 online shoppers, the number of respondents who indicated they don’t want brands tracking their location: 62% – 2015
  • In a survey of 507 online shoppers, the amount who “prefer offers that are targeted to where they are and what they are doing:” 60% – 2015 
  • Number of surveyed American consumers willing to provide data to corporations under the following conditions: 
    • “Data about my social concerns to better connect me with non-profit organizations that advance those causes:” 19% – 2018
    • “Data about my DNA to help me uncover any hereditary illnesses:” 21% – 2018
    • “Data about my interests and hobbies to receive relevant information and offers from online sellers:” 32% – 2018
    • “Data about my location to help me find the fastest route to my destination:” 40% – 2018
    • “My email address to receive exclusive offers from my favorite brands:”  56% – 2018  

Consumer Attitudes 

  • Academic study participants willing to donate personal data to research if it could lead to public good: 60% – 2014
  • Academic study participants willing to share personal data for research purposes in the interest of public good: 25% – 2014
  • Percentage who expect companies to “treat [them] like an individual, not as a member of some segment like ‘millennials’ or ‘suburban mothers:’” 74% – 2018 
    • Percentage who believe that brands should understand a “consumer’s individual situation (e.g. marital status, age, location, etc.)” when they’re being marketed to: 70% – 2018 Number who are “more annoyed” by companies now compared to 5 years ago: 40% – 2018Percentage worried their data is shared across companies without their permission: 88% – 2018Amount worried about a brand’s ability to track their behavior while on the brand’s website, app, or neither: 75% – 2018 
  • Consumers globally who expect brands to anticipate needs before they arise: 33%  – 2018 
  • Surveyed residents of the United Kingdom who identify as:
    • “Data pragmatists” willing to share personal data “under the right circumstances:” 58% – 2017
    • “Fundamentalists,” who would not share personal data for better services: 24% – 2017
    • Respondents who think data sharing is part of participating in the modern economy: 62% – 2018
    • Respondents who believe that data sharing benefits enterprises more than consumers: 75% – 2018
    • People who want more control over their data that enterprises collect: 84% – 2018
    • Percentage “unconcerned” about personal data protection: 18% – 2018
  • Percentage of Americans who think that government should do more to regulate large technology companies: 55% – 2018
  • Registered American voters who trust broadband companies with personal data “a great deal” or “a fair amount”: 43% – 2017
  • Americans who report experiencing a major data breach: 64% – 2017
  • Number of Americans who believe that their personal data is less secure than it was 5 years ago: 49% – 2019
  • Amount of surveyed American citizens who consider trust in a company an important factor for sharing data: 54% – 2018

Convenience

Microsoft’s 2015 Consumer Data Value Exchange Report attempts to understand consumer attitudes on the exchange of personal data across the global markets of Australia, Brazil, Canada, Colombia, Egypt, Germany, Kenya, Mexico, Nigeria, Spain, South Africa, United Kingdom and the United States. From their survey of 16,500 users, they find:

  • The most popular incentives for sharing data are: 
    • Cash rewards: 64% – 2015
    • Significant discounts: 49% – 2015
    • Streamlined processes: 29% – 2015
    • New ideas: 28% – 2015
  • Respondents who would prefer to see more ads to get new services: 34% – 2015
  • Respondents willing to share search terms for a service that enabled fewer steps to get things done: 70% – 2015 
  • Respondents willing to share activity data for such an improvement: 82% – 2015
  • Respondents willing to share their gender for “a service that inspires something new based on others like them:” 79% – 2015

A 2015 Pew Research Center survey presented Americans with several data-sharing scenarios related to convenience. Participants could respond: “acceptable,” “it depends,” or “not acceptable” to the following scenarios: 

  • Share health information to get access to personal health records and arrange appointments more easily:
    • Acceptable: 52% – 2015
    • It depends: 20% – 2015
    • Not acceptable: 26% – 2015
  • Share data for discounted auto insurance rates: 
    • Acceptable: 37% – 2015
    • It depends: 16% – 2015
    • Not acceptable: 45% – 2015
  • Share data for free social media services: 
    • Acceptable: 33% – 2015
    • It depends: 15% – 2015
    • Not acceptable: 51% – 2015
  • Share data on smart thermostats for cheaper energy bills: 
    • Acceptable: 33% – 2015
    • It depends: 15% – 2015
    • Not acceptable: 51% – 2015

Other Studies

  • Surveyed banking and insurance customers who would exchange personal data for:
    • Targeted auto insurance premiums: 64% – 2019
    • Better life insurance premiums for healthy lifestyle choices: 52% – 2019 
  • Surveyed banking and insurance customers willing to share data specifically related to income, location and lifestyle habits to: 
    • Secure faster loan approvals: 81.3% – 2019
    • Lower the chances of injury or loss: 79.7% – 2019 
    • Receive discounts on non-insurance products or services: 74.6% – 2019
    • Receive text alerts related to banking account activity: 59.8% – 2019 
    • Get saving advice based on spending patterns: 56.6% – 2019
  • In a survey of over 7,000 members of the public around the globe, respondents indicated:
    • They thought “smartphone and tablet apps used for navigation, chat, and news that can access your contacts, photos, and browsing history” is “creepy;” 16% – 2016
    • Emailing a friend about a trip to Paris and receiving advertisements for hotels, restaurants and excursions in Paris is “creepy:” 32% – 2016
    • A free fitness-tracking device that monitors your well-being and sends a monthly report to you and your employer is “creepy:” 45% – 2016
    • A telematics device that allows emergency services to track your vehicle is “creepy:” 78% – 2016
  • The number of British residents who do not want to work with virtual agents of any kind: 48% – 2017
  • Americans who disagree that “if companies give me a discount, it is a fair exchange for them to collect information about me without my knowing”: 91% – 2015

Data Brokers, Intermediaries, and Third Parties 

  • Americans who consider it acceptable for a grocery store to offer a free loyalty card in exchange for selling their shopping data to third parties: 47% – 2016
  • Number of people who know that “searches, site visits and purchases” are reviewed without consent:  55% – 2015
  • The number of people in 1991 who wanted companies to ask them for permission first before collecting their personal information and selling that data to intermediaries: 93% – 1991
    • Number of Americans who “would be very concerned if the company at which their data were stored sold it to another party:” 90% – 2008
    • Percentage of Americans who think it’s unacceptable for their grocery store to share their shopping data with third parties in exchange for a free loyalty card: 32% – 2016
  • Percentage of Americans who think that government needs to do more to regulate advertisers: 64% – 2016
    • Number of Americans who “want to have control over what marketers can learn about” them online: 84% – 2015
    • Percentage of Americans who think they have no power over marketers to figure out what they’re learning about them: 58% – 2015
  • Registered American voters who are “somewhat uncomfortable” or “very uncomfortable” with companies like Internet service providers or websites using personal data to recommend stories, articles, or videos:  56% – 2017
  • Registered American voters who are “somewhat uncomfortable” or “very uncomfortable” with companies like Internet service providers or websites selling their personal information to third parties for advertising purposes: 64% – 2017

Personal Health Data

The Robert Wood Johnson Foundation’s 2014 Health Data Exploration Project Report analyzes attitudes about personal health data (PHD). PHD is self-tracking data related to health that is traceable through wearable devices and sensors. The three major stakeholder groups involved in using PHD for public good are users, companies that track the users’ data, and researchers. 

  • Overall Respondents:
    • Percentage who believe anonymity is “very” or “extremely” important: 67% – 2014
    • Percentage who “probably would” or “definitely would” share their personal data with researchers: 78% – 2014
    • Percentage who believe that they own—or should own—all the data about them, even when it is indirectly collected: 54% – 2014
    • Percentage who think they share or ought to share ownership with the company: 30% – 2014
    • Percentage who think companies alone own or should own all the data about them: 4% – 2014
    • Percentage for whom data ownership “is not something I care about”: 13% – 2014
    • Percentage who indicated they wanted to own their data: 75% – 2014 
    • Percentage who would share data only if “privacy were assured:” 68% – 2014
    • People who would supply data regardless of privacy or compensation: 27% – 2014
      • Percentage of participants who mentioned privacy, anonymity, or confidentiality when asked under what conditions they would share their data:  63% – 2014
      • Percentage who would be “more” or “much more” likely to share data for compensation: 56% – 2014
      • Percentage who indicated compensation would make no difference: 38% – 2014
      • Amount opposed to commercial  or profit-making use of their data: 13% – 2014
    • Percentage of people who would only share personal health data with a guarantee of:
      • Privacy: 57% – 2014
      • Anonymization: 90% – 2014
  • Surveyed Researchers: 
    • Percentage who agree or strongly agree that self-tracking data would help provide more insights in their research: 89% – 2014
    • Percentage who say PHD could answer questions that other data sources could not: 95% – 2014
    • Percentage who have used public datasets: 57% – 2014
    • Percentage who have paid for data for research: 19% – 2014
    • Percentage who have used self-tracking data before for research purposes: 46% – 2014
    • Percentage who have worked with application, device, or social media companies: 23% – 2014
    • Percentage who “somewhat disagree” or “strongly disagree” there are barriers that cannot be overcome to using self-tracking data in their research: 82% – 2014 

SOURCES: 

“2019 Accenture Global Financial Services Consumer Study: Discover the Patterns in Personality”, Accenture, 2019. 

“Americans’ Views About Data Collection and Security”, Pew Research Center, 2015. 

“Data Donation: Sharing Personal Data for Public Good?”, ResearchGate, 2014.

Data privacy: What the consumer really thinks,” Acxiom, 2018.

“Exclusive: Public wants Big Tech regulated”, Axios, 2018.

Consumer data value exchange,” Microsoft, 2015.

Crossing the Line: Staying on the right side of consumer privacy,” KPMG International Cooperative, 2016.

“How do you feel about the government sharing our personal data? – livechat”, The Guardian, 2017. 

“Personal data for public good: using health information in medical research”, The Academy of Medical Sciences, 2006. 

“Personal Data for the Public Good: New Opportunities to Enrich Understanding of Individual and Population Health”, Robert Wood Johnson Foundation, Health Data Exploration Project, Calit2, UC Irvine and UC San Diego, 2014. 

“Pew Internet and American Life Project: Cloud Computing Raises Privacy Concerns”, Pew Research Center, 2008. 

“Poll: Little Trust That Tech Giants Will Keep Personal Data Private”, Morning Consult & Politico, 2017. 

“Privacy and Information Sharing”, Pew Research Center, 2016. 

“Privacy, Data and the Consumer: What US Thinks About Sharing Data”, MarTech Advisor, 2018. 

“Public Opinion on Privacy”, Electronic Privacy Information Center, 2019. 

“Selligent Marketing Cloud Study Finds Consumer Expectations and Marketer Challenges are Rising in Tandem”, Selligent Marketing Cloud, 2018. 

The Data-Sharing Disconnect: The Impact of Context, Consumer Trust, and Relevance in Retail Marketing,” Boxever, 2015. 

Microsoft Research reveals understanding gap in the brand-consumer data exchange,” Microsoft Research, 2015.

“Survey: 58% will share personal data under the right circumstances”, Marketing Land: Third Door Media, 2019. 

“The state of privacy in post-Snowden America”, Pew Research Center, 2016. 

The Tradeoff Fallacy: How Marketers Are Misrepresenting American Consumers And Opening Them Up to Exploitation”, University of Pennsylvania, 2015.

Index: The Data Universe 2019


By Michelle Winowatan, Andrew J. Zahuranec, Andrew Young, Stefaan Verhulst, Max Jun Kim

The Living Library Index – inspired by the Harper’s Index – provides important statistics and highlights global trends in governance innovation. This installment focuses on the data universe.

Please share any additional, illustrative statistics on data, or other issues at the nexus of technology and governance, with us at info@thelivinglib.org

Internet Traffic:

  • Percentage of the world’s population that uses the internet: 51.2% (3.9 billion people) – 2018
  • Number of search processed worldwide by Google every year: at least 2 trillion – 2016
  • Website traffic worldwide generated through mobile phones: 52.2% – 2018
  • The total number of mobile subscriptions in the first quarter of 2019: 7.9 billion (addition of 44 million in quarter) – 2019
  • Amount of mobile data traffic worldwide: nearly 30 billion GB – 2018
  • Data category with highest traffic worldwide: video (60%) – 2018
  • Global average of data traffic per smartphone per month: 5.6 GB – 2018
    • North America: 7 GB – 2018
    • Latin America: 3.1 GB – 2018
    • Western Europe: 6.7 GB – 2018
    • Central and Eastern Europe: 4.5 GB – 2018
    • North East Asia: 7.1 GB – 2018
    • Southeast Asia and Oceania: 3.6 GB – 2018
    • India, Nepal, and Bhutan: 9.8 GB – 2018
    • Middle East and Africa: 3.0 GB – 2018
  • Time between the creation of each new bitcoin block: 9.27 minutes – 2019

Streaming Services:

  • Total hours of video streamed by Netflix users every minute: 97,222 – 2017
  • Hours of YouTube watched per day: over 1 billion – 2018
  • Number of tracks uploaded to Spotify every day: Over 20,000 – 2019
  • Number of Spotify’s monthly active users: 232 million – 2019
  • Spotify’s total subscribers: 108 million – 2019
  • Spotify’s hours of content listened: 17 billion – 2019
  • Total number of songs on Spotify’s catalog: over 30 million – 2019
  • Apple Music’s total subscribers: 60 million – 2019
  • Total number of songs on Apple Music’s catalog: 45 million – 2019

Social Media:

Calls and Messaging:

Retail/Financial Transaction:

  • Number of packages shipped by Amazon in a year: 5 billion – 2017
  • Total value of payments processed by Venmo in a year: USD 62 billion – 2019
  • Based on an independent analysis of public transactions on Venmo in 2017:
  • Based on a non-representative survey of 2,436 US consumers between the ages of 21 and 72 on P2P platforms:
    • The average volume of transactions handled by Venmo: USD 64.2 billion – 2019
    • The average volume of transactions handled by Zelle: USD 122.0 billion – 2019
    • The average volume of transactions handled by PayPal: USD 141.8 billion – 2019 
    • Platform with the highest percent adoption among all consumers: PayPal (48%) – 2019 

Internet of Things:

Sources: