Selected Readings on Open Data and Generative AI


By: María Esther Cervantes, Hannah Chafetz, Sampriti Saxena, & Stefaan G. Verhulst

Generative AI tools are increasingly used across sectors, including in governments. However, there is limited research on how these generative AI tools could impact open data policies and programs. What are the opportunities for generative AI and open data? What are the risks? Could generative AI transform the role of statistical agencies? Is there a need for a global charter to govern generative AI? 

Towards this end, in May 2023, The GovLab’s Open Data Policy Lab (a collaboration between The GovLab and Microsoft) hosted a panel discussion on the intersections of generative AI and open data and the ways in which generative AI could alter our existing conception of a third wave of open data. Building on the takeaways from this discussion, below we provide a curated list of annotated readings (listed alphabetically) on these topics. 

These selected readings focus on three main areas:  (1) the opportunities and risks of applying generative AI for open data, (2) generative AI governance models and discussion, and (3) the new role of national statistical agencies in the advent of these technologies. Given the speed at which these technologies are changing, we incorporate a wide variety of sources such as journal articles, reports from international organizations and think tanks, and blog posts. 

We found several common themes across these readings. First, there is generally consensus that generative AI tools can provide value for open data and National Statistical Offices, whether it be for increasing data discovery, accessibility, or stakeholder collaboration. However, privacy, security, and safety risks remain prevalent and must be balanced. Second, there is a lack of common standards or policies for generative AI specifically. There are concerns that without a common language or standardization, algorithms may be misconstrued across borders. Third, governments are recommending synthetic data as a way to minimize privacy concerns with open data. If done responsibly, generative AI could help produce synthetic data at a larger scale. Lastly, governments around the world do not all have the same capabilities and resources for applying generative AI in their work. The countries that lag behind on these capabilities may have more challenges and risks when trying to incorporate generative AI into their public services.

*****

Alam, Zaidul. “Harnessing the Power of Generative AI in a World of Open Government Data.” LinkedIn Blog, June 15, 2023.

  • In this LinkedIn article, the author discusses the opportunities to leverage Open Government Data (specifically, census data) for generative AI.
  • The author explains that Open Data and generative AI could be merged in several ways including: helping increase interactions between citizens and governments, develop tools to engage with public institutions, and answer search queries about domain specific data (e.g. health data). 
  • The author provides an example of how census data and AI applications could be merged: “By leveraging data APIs from the ABS and other similar institutions globally, Census Chat GPT could generate real-time, data-driven insights about demographic trends, socio-economic disparities, housing statistics, and more.”
  • There are many possible intersections between generative AI and Open Government Data: “In the future, we could see more sophisticated applications of generative AI to government open data. For example, AI could be used to generate comprehensive city planning scenarios based on urban development data, or to create personalized learning plans for students based on education data. Governments could also develop AI ‘public assistants’ that can explain complex legislation, provide real-time updates on policy changes, or guide citizens through bureaucratic procedures. Such AI assistants could democratize access to public information, reduce administrative burdens, and enhance civic engagement.”

Boom, Cedric, and Michael Reusens. Changing Data Sources in the Age of Machine Learning for Official Statistics, 2023. https://doi.org/10.48550/arXiv.2306.04338

  • This paper gives an overview of the main risks, liabilities and uncertainties associated with changing data sources in the context of machine learning for official statistics. 
  • The use of machine learning for official statistics has the potential to provide more timely, accurate and comprehensive insight into a wide range of topics, by leveraging the vast amounts of data that are generated by individuals and entities on a daily basis, statistical agencies can gain a more nuanced understanding of trends and patterns, but there are risks associated with this. Mainly, concerns about data quality, privacy and security and a need for the technical skills and infrastructure in government. 
  • Machine learning can be used to complement or even replace official statistics, and its ability to nowcast and forecast is an extremely valuable addition. By incorporating machine learning into official statistical production, one can benefit from the strengths of both approaches and make more informed decisions based on the most current and accurate data.
  • National statistics agencies are used to having their data completely under their control, but using external data sources to power innovative statistics can become problematic, establishing proper protocols and procedures for external data management is necessary. 

Goasduff, Laurence. “Is Synthetic Data the Future of AI? Q&A with Alexander Linden.” Gartner Interview, November 20, 2022.

  • In this interview with Alexander Linden, a VP Analyst at Gartner, he talks about the potential of synthetic data as a complement to open data to drive the development of more accurate AI models. 
  • He says, “Synthetic data can increase the accuracy of machine learning models. Real-world data is happenstance and does not contain all permutations of conditions or events possible in the real world. Synthetic data can counter this by generating data at the edges, or for conditions not yet seen.”
  • While synthetic data may offer a way to address biases and issues of quality in open data, Linden emphasizes the importance of transparency and explainability when it comes to the models creating and using synthetic data. 

Loukis, Euripidis, Stuti Saxena, Nina Rizun, Maria Ioanna Maratsi, Mohsan Ali, and Charalampos Alexopoulos. “ChatGPT Application Vis-a-Vis Open Government Data (OGD): Capabilities, Public Values, Issues and a Research Agenda.” In Electronic Government, edited by Ida Lindgren, Csaba Csáki, Evangelos Kalampokis, Marijn Janssen, Gabriela Viale Pereira, Shefali Virkar, Efthimios Tambouris, and Anneke Zuiderwijk, 95–110. Lecture Notes in Computer Science. Cham: Springer Nature Switzerland, 2023. https://doi.org/10.1007/978-3-031-41138-0_7.

  • In this paper, the authors analyze the opportunities and risks of using ChatGPT for Open Government Data from an Affordances Theory perspective. Through 12 expert interviews, the authors develop a series of research agendas to accelerate the understanding of how ChatGPT could impact Open Government Data. 
  • ChatGPT could have a positive impact on Open Government Data in several ways. These include: increasing user engagement, awareness, and accessibility, helping develop new Open Government strategies, offering new ways for data discovery through government chatbots, and balancing the supply and demand of Open Government Data. Additionally, from a public values perspective, ChatGPT could provide service-related and professionalism-related values for Open Government Data. It could help design user-driven Open Government Data initiatives and lower barriers to accessing Open Government Data amongst different stakeholders (e.g. citizens)–increasing transparency around government initiatives. 
  • The authors point to several issues that ChatGPT could pose for Open Government Data such as unknowingly collecting personal information from registered users and inaccurate summaries of Open Government Data from ChatGPT. Also, the lack of governance frameworks could lead to larger problems such as inadequate results, cybersecurity issues, and algorithmic biases caused by language differences across countries. 
  • In order to harness the value of ChatGPT for Open Government Data, additional research is needed on how ChatGPT could be used to increase use and value generation from Open Government Data, how ChatGPT could benefit the publishing of Open Government Data, and the potential issues of ChatGPT for Open Government Data. 

Sallier, Kenza, and Kate Burnett-Isaacs. “Unlocking the Power of Data Synthesis with the Starter Guide on Synthetic Data for Official Statistics.” Statistics Canada, March 10, 2023.

  • In this piece, Statistics Canada provides a set of guidelines for National Statistics Offices to use when leveraging synthetic data. 
  • Using UNECE’s report as the guide, the piece explains that using synthetic data can help increase access to statistical data in a privacy compliant manner. It can help with publishing data, testing analysis, education, and testing software. Additionally, it explains the three main ways in which synthetic data can be generated: sequential modeling, stimulated data, and deep learning methods. 
  • The article provides an overview of the pros and cons of using Generative Adversarial Networks to create synthetic data for National Statistics Offices.
    • Pros: “GANs have been used in NSOs to generate continuous, discrete and textual datasets, while ensuring that the underlying distribution and patterns of the original data are preserved. Furthermore, recent research has been focused on the generation of free-text data which can be convenient in situations where models need to be developed to classify text data.”
    • Cons: “GANs can be seen as too complex to understand, explain or implement where there is only a minimal knowledge of neural networks. There is often a criticism associated with neural networks as lacking in transparency. The method is time consuming and has a high demand for computational resources. GANs may suffer from mode collapse, and lack of diversity, although newer variations of the algorithm seem to remedy these issues. Modelling discrete data can be difficult for GAN models.”
  • In sum, the article explains that synthetic data can provide benefits for National Statistics Offices and Generative Adversarial Networks can help produce the synthetic data. However, those undertaking the initiative need to balance the many associated risks. 

Ziesche, Soenke. “Open Data for AI: What Now?” UNESCO Digital Library, 2023. 

  • This report summarizes UNESCO’s guidelines for Member States in opening up data for AI systems. 
  • The report explains that there is an enormous amount of data already being collected through automated systems (building off of the COVID-19 pandemic). This data is often too large to be manually processed. AI and data science methods have the capacity to discover new information from these large data sources. 
  • The report is divided into 3 phases: the preparation phase, the opening data phase, and follow-up phase for data re-use: “The preparation phase guides  Member  States  in  preparing  for  opening  their  data,  and  includes  the  fol-lowing  suggested  steps:  drafting  an  open  data  policy,  gathering  and  collecting high quality data, developing open data capacities and making the data AI-ready. The opening of the data phase consists of the following steps: selecting datasets to be opened, opening the datasets legally, opening the datasets technically, and creating  an  open-data-driven  culture.  The  follow-up  for  reuse  and  sustainability phase consists of the following steps: supporting citizen engagement, supporting international engagement, supporting beneficial AI engagement, and maintaining high quality data.”

*****

We plan to explore these topics further over the coming months. Professionals interested in collaborating with The GovLab on these topics can contact Stefaan Verhulst, Co-Founder & Chief Research and Development Officer at sverhulst@thegovlab.org.

Stay up-to-date on the latest developments of this work by signing up for the Data Stewards Network Newsletter.

Learn more about the Open Data Policy Lab by visiting our website: https://opendatapolicylab.org/.

Selected Readings on the LGTBQ+ Community and Data


By Uma Kalkar, Salwa Mansuri, Marine Ragnet and Andrew J. Zahuranec

As part of an ongoing effort to contribute to current topics in data, technology, and governance, The GovLab’s Selected Readings series provides an annotated and curated collection of recommended works on themes such as open data, data collaboration, and civic technology.

Around the world, LGBTQ+ people face exclusion and discrimination that undermines their capacity to live their lives and succeed. Together with allies, many LGBTQ+ people are fighting to exercise their rights and achieve full equality. However, this struggle has been undermined by a lack of specific, quantifiable information on the challenges they face.

When collected and managed responsibly, data about sexual and gender minorities can be used to protect and empower LGBTQ+ people through informed policy and advocacy work. To this end, this Selected Reading investigates what data is (and is not) collected about LGBTQ+ individuals in the areas within healthcare, education, economics, and public policy and the ramifications of these outcomes. It offers a perspective on some of the existing gaps regarding LGBTQ+ data collection. It also examines the various challenges that LGBTQ+ groups have had to overcome through a data lens. While activism and advocacy has increased the visibility and acceptance of sexual and gender minorities and allowed them to better exercise their rights in society, significant inequities remain. Our literature review puts forward some of these recent efforts.

Most of the papers included in this review, however, conclude with similar findings: data for about LGBTQ+ communities is still lacking and as a result, research on the topic is often times also lagging behind. This is particularly problematic, as detailed in some of our readings, because LGBTQ+ populations are often at the center of discrimination and still face disparate health vulnerabilities. The LGBTQI+ Data Inclusion Act, which recently passed the US House of Representatives and would require over 100 federal agencies to improve data collection and surveying of LGBTQ communities, seeks to address this gap.

We hope this selection of readings can provide some clarity on current data-driven research for and about LGBTQ+ individuals. The readings are presented in alphabetical order.

***

Selected Reading List (in alphabetical order)

***

Annotated Selected Reading List (in alphabetical order):

D’Ignazio, Catherine, and Lauren F. Klein. Data Feminism. MIT Press, 2020. https://mitpress.mit.edu/books/data-feminism.

  • D’Ignazio and Klein investigate how data has been historically used to maintain specific social status quos. To overcome this challenge, they approach data collection and uses through an intersectional, feminist lens that identifies issues in current data handling systems and looks toward solutions for more inclusive data applications.
  • The editors define data feminism as “power, about who has it and who doesn’t, and about how those differentials of power can be challenged and changed using data.” The book centers around seven principles that identify and challenge existing power structures around data and seek pluralist, context-based data processes that illuminate hidden and missed data.

Giblon, Rachel, and Greta R. Bauer. “Health care availability, quality, and unmet need: a comparison of transgender and cisgender residents of Ontario, Canada.” BMC Health Services Research 17, no. 1 (2017): 1–10. https://bmchealthservres.biomedcentral.com/articles/10.1186/s12913-017-2226-z.

  • Canada boasts a universal healthcare and insurance system, yet disparities exist between the treatment quality, services, and knowledge about transgender patients.
  • Data collection on transgender, non-binary, and intersex individuals is not conducted in Canadian health surveys, making it difficult to compare and contrast the healthcare provided to transgender people with that provided to cisgender people. Moreover, a lack of physician knowledge about trans needs and/or refusal to provide hormone therapy/ gender-affirming procedures result in trans individuals explicitly avoiding medical services. The lack of services, comfort, and data about transgender people in Canada demonstrate their severely “unmet health care need.”
  • Using data about Ontario residents from the Canadian Community Health Survey and the Trans PULSE survey, the researchers find that 33% transgender Ontarians had an unmet health need that would not be unmet if they were cisgender. As well, transgender men and women found the quality of healthcare in their community to be poor than compared to cisgender individuals. Twenty-one percent of transgender people avoided going to emergency rooms because of their gender identity.

Bowleg, Lisa, and Stewart Landers. “The need for COVID-19 LGBTQ-specific data.” American Journal of Public Health 111, no. 9 (2021): 1604–1605. https://pubmed.ncbi.nlm.nih.gov/34436923/.

  • The adage “no data, no problem” has been magnified during the pandemic, highlighting gaps around data collection for LGBTQ communities, which often intersect with other communities who are disproportionately at-risk for COVID-19, such as minority populations in the service industry and those who smoke.
  • Despite concerns about the stigma facing LBGTQ communities, data collection from these demographics has been relatively feasible, with federal governments drastically increasing their data collection from LGBTQ communities.
  • However, the lack of direction and guidance at a federal level to collect sexual and gender minority data has stunted information about how this demographic has experienced COVID-19 when compared to cis-gender, heterosexual groups. The authors stress the need for data collection from LGBTQ communities and advocacy to encourage these practices to help address the pandemic.

Marshall, Zack, Vivian Welch, Alexa Minichiello, Michelle Swab, Fern Brunger, and Chris Kaposy. “Documenting research with transgender, nonbinary, and other gender diverse (trans) individuals and communities: introducing the global trans research evidence map.” Transgender Health 4, no. 1 (2019): 68–80. https://www.liebertpub.com/doi/10.1089/trgh.2018.0020.

  • Marshall and colleagues study a series of 15 academic databases to assemble a dataset describing 690 trans-focused articles. They then map where and how transgender “have been studied and represented within and across multiple fields of research” to understand the landscape of existing research on transgender people. They find that research around the trans community focused on physical and mental healthcare services and marginalization and were primarily observational research.
  • The authors found that social determinants of health for transgender people were the least studied, along with ethnicity, culture, and race, violence, early life experiences, activism, and education.
  • With this evidence map, researchers have a strong starting point to further explore issues through a LGBTQ lens and better engage with trans people and perspectives when looking at social problems.

Medina, Caroline and Lindsay Mahowald. “Collecting Data about LGBTQI+ and Other Sexual and Gender-Diverse Communities.” Center for American Progress, May 26, 2022. https://www.americanprogress.org/article/collecting-data-about-lgbtqi-and-other-sexual-and-gender-diverse-communities.

  • The paper argues, that despite advances “a persistent lack of routine data collection on sexual orientation, gender identity, and variations in sex characteristics (SOGISC) is still a substantial roadblock for policymakers, researchers, service providers, and advocates seeking to improve the health and well-being of LGBTQI+ people.”
  • Even though various types of data are integral to the experiences of LGBTQI+ people, the report narrows its focus to data collection in two forms of environments: general population surveys & surveys regarding LGBTQI+ people. Specific population surveys such as the latter provide significant advantage to capture specific and sensitive data.
  • It argues that a range of precautions can be adopted from a research design perspective to ensure that personal data and information is handled with care and matches ethical standards as outlined in the Data Ethics Framework of the Federal Data Strategy ranging from privacy and confidentiality to honesty and transparency.

Miner, Michael H., Walter O. Bockting, Rebecca Swinburne Romine, and Sivakumaran Raman. “Conducting internet research with the transgender population: Reaching broad samples and collecting valid data.” Social science computer review 30, no. 2 (2012): 202–211. https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3769415/.

  • The internet has the potential to collect information from transgender people, who are “a hard-to-reach, relatively small, and geographically dispersed population” in a diverse and representative manner.
  • To study HIV risk behaviors of transgender individuals in the U.S., Miner et al. developed an online tool that recruited individuals who frequent websites that are important for the transgender community and used quantiative and qualitative methods to learn more about these individuals. They conclude that while online data collection can be difficult to ensure internal validity, careful testing and methods can overcome these issues to improve data quality on transgender people.

Pega, Frank, Sari L. Reisner, Randall L. Sell, and Jaimie F. Veale. “Transgender health: New Zealand’s innovative statistical standard for gender identity.” American journal of public health 107, no. 2 (2017): 217–221. https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5227923/.

  • Pega et al. discuss New Zealand’s national statistical standard for gender identity data collection, the first of its kind. More governments in Australia and the United States are now following suit to address the health access and information disparity that transgender people face.
  • Data about transgender people has advanced progressive policy action in New Zealand, and the authors celebrate this statistical standard as a way to collect high quality data for data-driven policies to support these groups.
  • While this move will help uncover LGBTQ individuals currently hidden in data, the authors critique the standard because it does not “promote the two-question method, risking misclassification and undercounts; does promote the use of the ambiguous response category “gender diverse” in standard questions; and is not intersex inclusive.”

Ruberg, Bonnie, and Spencer Ruelos. “Data for Queer Lives: How LGBTQ Gender and Sexuality Identities Challenge Norms of Demographics.” Big Data & Society 7, no. 1 (June 18, 2020): 205395172093328. https://journals.sagepub.com/doi/full/10.1177/2053951720933286.

  • Drawing from the responses of 178 people who identified as non-heterosexual or non-cisgender in a survey, this paper argues that “dominant notions of demographic data, […] that seeks to accurately categorize and “capture” identity do not sufficiently account for the complexities of LGBTQ lives.”
  • Demographic data commonly imagines identity as fixed, singular, and discrete. However, the researchers’ findings suggest that, for LGBTQ people, gender and sexual identities are often multiple and in flux. Most respondents reported their understanding of their identity shifting over time. For many, “gender identity was made up of overlapping factors, including the relationship between gender and transgender identities. These findings challenge researchers to reconsider how identity is understood as and through data.” They argue that considering identities as fixed and discrete are not only exclusionary but also do not wholly represent the dynamic and fluid nature of gender identities.
  • The piece offers several recommendations to address this challenge. Firstly, the researchers argue to remove data discreteness, which will enable users to select multiple identities rather than choose one from a drop-down list. Secondly, create communication and feedback channels for LGBTQ+ to express whether surveys and other data collection methods are sufficiently inclusive and gender-sensitive.

Sell, Randall L. “LGBTQ health surveillance: data = power.” American Journal of Public Health 107, no. 6 (2017): 843–844. https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5425894/.

  • Sell recounts his motto: ‘data = power;’ ‘silence = death’ and how LGBTQ people have been victims of this situation. He argues that health research and surveillance has systemically ignored sexual and gender minorities, leading to gaps in administrative understanding and policies for LGBTQ population.
  • He laments that very few surveys on American health collect sexual and gender orientation data, and the lack of standardization around this data collection muddies researchers’ ability to collate and utilize the information meaningfully.
  • He calls for legislation that mandates the National Institutes of Health to include sexual and gender minorities in all publicly funded research similar to the specific inclusion requirement of women and racial and ethnic minorities in studies. Despite concerns about surveillance and targeting of LGBTQ minorities, Sell argues that data collection is imperative now for a long-scale understanding of the needs of the community, transcending political terms.

Snapp, Shannon D., Stephen T. Russell, Mariella Arredondo, and Russell Skiba. “A right to disclose: LGBTQ youth representation in data, science, and policy.” Advances in child development and behavior 50 (2016): 135–159. https://pubmed.ncbi.nlm.nih.gov/26956072/.

  • Despite significant and positive reforms such as the legalization of same-sex marriages and protection from intersectional sexual harrasment (Webb, 2011) in the United States, there is a striking gap in literature on evidence-based practices that support LGBTQ+ Youth (Kosciw & Pizmony-Levy, 2013). The lack of data-driven solutions stifle the creation of inclusive environments where members of the LGBTQI+ community feel heard and seen. There is a striking gap in literature on evidence-based practices that support LGBTQ+ Youth (also see Kosciw & Pizmony-Levy, 2013Mustanski, 2011).
  • At present federal and local state data-states do not include SOGI (Sexual Oreintation & Gender Identity) in demographic questions. Data sets that do have spaces to disclose SOGI are largely in a health-related setting such as the Centre for Disease Control or Youth Risk Behavior. As such learning and education disparities and outcomes are not accurately measured.
  • Missing systematic SOGI data renders members of the LGBTQ+ community invisible and sidelined. As such several members of civil society have therefore demanded for the need to gather SOGI data in the Department of Health, Education & Justice. Such data is therefore central to holistically encapsulate the discriminatory experiencees LGBTQ+ Youth face in an education setting, integral to well-being and development. Scholars and research teams have thusfar overcome the barriers of data reliability and validity (see Ridolfo, Miller, & Maitland, 2012) by collating the most effective methods for data collection (Sexual Minority Assessment Research Team, 2009).

Wimberly, George L. “Chapter 10: Use of large-scale data sets and LGBTQ education.” LGBTQ issues in education: Advancing a research agenda (2015): 175–218. https://ebooks.aera.net/LGBTQCH10.

  • This book chapter highlighs the importance of large-scale data sets to gain understanding about LGBTQ students, school experiences, and academic achievement.
  • Young people who identify as LGBTQ tend to be generalized and ways that LGBTQ identification questions are asked by surveys change across years, making it important to disaggregate large-scale data for more granular knowledge about LGBTQ people in education.
  • Wimberly provides information about multiple datasets that collect this information, how they ask questions on LGBTQ identity, and ways in which the datasets have been used or have the potential to be leveraged for a more comprehensive understanding of students. He also points out the limitations of existing data sets, namely that they tend to be retrospective of the LGBTQ adolescent experience and collected from convenience samples, such as college students. This limitation also impacts the external validity of the data, especially with regard to rural, racialized, and lower-income LGBTQ students.

Selected Readings on the Intersection of Data, Abortion Care, and Women’s Health


By: Uma Kalkar, Salwa Mansuri, Andrew J. Zahuranec

As part of an ongoing effort to contribute to current topics in data, technology, and governance, The GovLab’s Selected Readings series provides an annotated and curated collection of recommended readings on themes such as open data, data collaboration, and civic technology.

In this edition, we reflect on the intersection between data, abortion, and women’s health following the United States Supreme Court ruling regarding Dobbs v. Jackson Women’s Health Organization which held that there was no constitutional right to abortion and decided that individual states have the authority to regulate access to abortion services. In the days before and since the decision, a large amount of literature has been produced both on the implications of this ruling for individuals’ data privacy and the effects on women’s social and economic lives. It is clear that, while opinions on access to abortion services are often influenced by deeply held attitudes about women’s bodily autonomy and when life begins, data has critical importance both as a potential source of risk and as a tool to understand the decision’s impact.

Below we curate some stories from news sources and academic papers on the role of data in abortion services as well as data-driven research by institutions into the effects of abortion. We hope this selection of readings provides a broader perspective on how data and women’s rights and health intersect.

As well, we urge that anyone seeking further information about abortion access visit www.ineedana.com via a secure site, and preferably via a VPN. For those looking for menstrual apps, Spot On by the Planned Parenthood Federation of America saves data locally on phones, does not provide information to third parties, and allows for anonymous accounts.

The readings are presented in alphabetical order.

***

Data & Privacy Concerns

Conti-Cook, Cynthia. “Surveilling the Digital Abortion Diary: A Preview of How Anti-Abortion Prosecutors Will Weaponize Commonly-Used Digital Devices As Criminal Evidence Against Pregnant People and Abortion Providers in a Post-Roe America.” University of Baltimore Law Review, forthcoming. https://papers.ssrn.com/sol3/papers.cfm?abstract_id=3666305

  • In this four-part article, Conti-Cook discusses the history of health data rights and the long-standing ways in which digital evidence produced by pregnant people has been used to prosecute their actions. She discusses how digital technologies help prosecutors lay charges against those seeking abortions and how they help “ the state see[k] control over [them] by virtue of their pregnancy status” by digitally surveilling them.
  • The author examines how “digital, biometric, and genetic surveillance” serves as a vehicle to “microtarget” historically oppressed communities” under a patriarchal and racist social structure.
  • She also discusses how online searches relating to pregnancy termination and abortion, location and tracking data, site history, wearable devices, and app data can be factored into risk assessment tools to assess social service outcomes and federal prosecutions.
  • Conti-Cook ends by reviewing digital hygiene strategies to stop the use of personal data against oneself and foster a more critical use of digital tools for reproductive and pregnancy-related health needs.

Diamant, Jeff, and Besheer Mohamed. “What the Data Says about Abortion in the U.S.” Pew Research Center, June 24, 2022. https://www.pewresearch.org/fact-tank/2022/06/24/what-the-data-says-about-abortion-in-the-u-s-2

  • In the aftermath of the overturn of Roe v. Wade (1973), the Pew Research Center published a compilation of facts and statistics about abortion care in the United States obtained through the Centers for Disease Control and Prevention and Guttmacher Institute.
  • The piece describes shifting trends pertaining to the number of legal abortions conducted each year in the United States since the 1970s, the abortion rate among women, the most common types of abortions, and the number of abortion providers over time. It describes, for example, how the procedure has generally declined at “a slow yet steady pace” since the early 1990s. It also notes that the number of providers has declined over time.

Paul, Kari. “Tech Firms under Pressure to Safeguard User Data as Abortion Prosecutions Loom.” The Guardian, June 25, 2022, sec. US news. https://www.theguardian.com/us-news/2022/jun/25/tech-companies-health-data-security-abortion-prosecution

  • Paul writes about the concerns of abortion and civil rights activists on how data collected about individuals through apps and online searches might incriminate those seeking or providing abortion services. It notes how geo-location data used by tech companies can make “it easy for law enforcement officials to access incriminating data on location, internet searches, and communication history.”
  • While period tracking apps have received significant attention, the article notes that companies such as Meta, Uber, Lyft, Google, and Apple have yet to publicly announce how they would respond to law enforcement requests on abortion evidence.
  • The piece finally includes a recommendation from the digital rights advocacy group Electronic Frontier Foundation that companies preemptively prepare “for a future in which they are served with subpoenas and warrants seeking user data to prosecute abortion seekers and providers.” It suggests end-to-end encryption as a default, refraining from collecting location information, and allowing anonymous or pseudonymous access to apps.

Nguyen, Nicole, and Cordilia James. “How Period-Tracker Apps Treat Your Data, and What That Means If Roe v. Wade Is Overturned.” Wall Street Journal, June 21, 2022. https://www.wsj.com/articles/how-period-tracker-apps-treat-your-data-and-what-that-means-if-roe-v-wade-is-overturned-11655561595

  • Nguyen and James provide an extensive analysis of the ways that period tracking apps track, collect, store, and share data about women’s fertility and menstrual cycle. Following Dobbs v. Jackson Women’s Health Organization (2022), which overturned Roe v. Wade (1973), there has been significant public concern about the (re)use of the data these apps collect.
  • They detail different kinds of data that could be subpoenaed from period trackers and the terminology that users can search for in an app’s privacy policy to understand how their data will be used. It describes, for example, what it means to when Terms & Conditions outline how they will “encrypt” (that is, to scramble into an incoherent string of code), “share” or “sell” (data can be given to third parties such as advertisers), and respond to “requests” (companies may notify the user when a court or government data asks for data).
  • The article closes with an overview of the most-downloaded fertility apps — including Flo, Apple Health, Clue, FitBit, Glow, and Natural Cycles — and where they stand on data privacy.

Sherman, Jenna. “How Abortion Misinformation and Disinformation Spread Online.” Scientific American, June 24, 2022. https://www.scientificamerican.com/article/how-abortion-misinformation-and-disinformation-spread-online/

  • In Scientific American, Sherman writes an opinion piece on the growth of online dis- and misinformation in the aftermath of Dobbs. She summarizes how, according to current data-driven research, much of the information people find online about abortion is not reliable and that the highest volume of online searches about abortion tends to be in those states with the most restricted access.
  • Despite much research on abortion, Sherman notes “a lack of access to quality information or care” online, especially for marginalized communities. She also summarizes the results of studies on social media and search engines. In one 2021 study, searches for “abortion pill” tended not to yield scientifically accurate and moderately accessible information.
  • Another study cited in the article found that half of the web pages surfaced by Google on abortion contained misinformation. This appears to be by design — with false information about “abortion pill reversal” and abortion practices generating large revenues for platforms like Facebook.

Data on the Impact of Abortion Access

Amador, Diego. “The Consequences of Abortion and Contraception Policies on Young Women’s Reproductive Choices, Schooling and Labor Supply.” Documento CEDE №2017–43 (2017). https://ssrn.com/abstract=2987367

  • Amador analyzes aggregate provider data from the Guttmacher Institute to assess the relationship between contraceptive use, abortion, schooling, and labor decisions of US women. The dataset follows a sample of women born between 1980 and 1984, with data from interviews starting in 1997 and ending in 2011.
  • A counterfactual model based on the data suggests that a perfectly enforced ban on abortions would raise the rate of standard contraceptive use for women 9.1%. The fraction of children born to single mothers would increase from 30% to 34% while the average amount of schooling after high school would decrease by 3.1%. The number of women with college degrees would drop by 1.8% age points. The estimated average loss in lifetime earnings for women who would have at least had one abortion was estimated at USD 39,172.
  • The author also assesses the impact that free contraception would have, suggesting a 15.7 decrease in pregnancies per 1000 women and an 11.6 reduction in abortions per 1000 women. Accumulated schooling after high school increased by an estimated 3%. An assessment of mandatory counseling laws found that the long-run effect of these laws on women ages 18 to 30 was a 10% decrease in abortion rates.
  • The author concludes that policies such as an abortion ban and free contraception have important effects on schooling and lifetime earnings but only a moderate impact on labor supply.

ANSIRH. “Introduction to the Turnaway Study.” ANSIRH, March 2020. https://www.ansirh.org/sites/default/files/publications/files/turnawaystudyannotatedbibliography.pdf

  • This fact sheet summarizes various analyses stemming from the Turnaway Study, the first study to rigorously examine the effects of receiving abortion services versus being denied access to them. The study is an initiative by Advancing New Standards in Reproductive Health (ANSIRH), a program within the UCSF Bixby Center for Global Reproductive Health. It examines 1,000 women seeking abortion from 30 facilities around the country, with interviews conducted over five years.
  • Studies conducted with the dataset find that the most common reason for women to seek an abortion was not being able to afford a child and/or not having a suitable partner/parent involved to assist with childrearing. Most women don’t feel pressured by counseling that occurs in clinics but find it less helpful when it is state-mandated. Half of all women report seeing anti-abortion protestors at clinics and greater contact with them tends to be more upsetting.
  • Studies also suggest no evidence that abortion causes negative mental health outcomes, although being denied an abortion is associated with elevated anxiety and stress and lower self-esteem. Those who receive an abortion experience “a mix of positive and negative emotions in the days after […] with relief predominating.” The intensity of the emotion diminishes over time but over 95% of women report “abortion was the right decision for them at all times over five years after.”
  • Carrying an unwanted pregnancy tended to be associated with worse outcomes for women’s physical health and socioeconomic status. Women denied abortion who later gave birth reported more chronic pain and rated their overall health as worse. Economic insecurity for women and their families increased almost four-fold. In terms of education, women who received abortions tended to have higher odds of having positive one-year plans while women denied abortions were no more or less likely to drop out of school.

Donohue, John J., and Steven D. Levitt. “The Impact of Legalized Abortion on Crime Over the Last Two Decades.” The University of Chicago, Becker Friedman Institute for Economics Working Paper №2019–75 (May 2017). https://ssrn.com/abstract=3391510

  • This paper primarily argues that legalizing abortion in the 1970s had positive consequences in the significant reduction of crime even two decades later, in the 1990s. In particular, the paper suggested an approximate 20% decrease in crime rates between 1997 and 2014. Not only is abortion legalization a crucial factor but perhaps one of the most crucial ones in the significant reduction in crime rates (see Donohue and Levitt, 2001).
  • A particularly crucial aspect of the data collected was that it took close to a decade for the “number of abortions performed to reach a steady-state” attributed to the variability and heterogeneity of state-level data due to the variability and dynamic nature of evolving abortion legislation and abortion reform.
  • Moreover, the effect of abortion on crime rates was only incrementally visible as “crime-aged cohorts” were gradually exposed to legalized abortion. Donohue and Levitt’s work supports the abortion-crime hypothesis — that increased access to abortion would decrease crime.

Frost, Jennifer J., Jennifer Mueller, and Zoe H. Pleasure. “Trends and Differentials in Receipt of Sexual and Reproductive Health Services in the United States: Services Received and Sources of Care, 2006–2019.” The Guttmacher Institute, June 24, 2021. https://doi.org/10.1363/2021.33017

  • This report describes trends in reproductive and sexual health care across the United States over a 13-year period as told by the National Survey of Family Growth, the only national data source that contains detailed information on sexual and reproductive health. It finds that some 7 in 10 women of reproductive age (44 million people) make at least one medical visit for sexual and reproductive health care each year. However, disparities exist — Hispanic women are less likely to receive care than White women, and the uninsured are substantially less likely to receive care than privately insured women.
  • It further finds that publicly funded clinics were a critical source of care for young women, lower-income women, women of color, foreign-born women, women on Medicaid, and women without insurance.
  • The report also finds that the Affordable Care Act increased the number of women receiving contraceptive services by 8% among women with private providers. There was a complimentary drop among women receiving contraceptive care from publicly funded clinics.

Hill, J. Jackson IV. “The Need for a National Abortion Reporting Requirement: Why Both Sides Should Be in Support of Better Data.” Available at SSRN (May 2, 2014). https://ssrn.com/abstract=2306667.

  • Hill writes a paper urging organizations to improve the status of abortion reporting in the United States. Examining statistics collected by the Centers for Disease Control and the Guttmacher Institute, the author finds serious deficiencies, including a lack of voluntary reporting from states, conflicting requirements (or unenforced requirements) about what data is collected, and an absence of timely data.
  • After the passage of Roe, state legislatures attempted to mandate abortion reporting and monitoring; however, concerns over the safety of women’s choice, undue administrative hurdles, and issues over pervasive data collection made it difficult to impose a standardized, non-intrusive, and anonymized data collection practice.
  • Hill argues that these data gaps and paternalistic methods of collecting data have had consequences on the ability of policymakers to make decisions around abortion policy and undermine the public’s knowledge on the issue. He assesses the feasibility of federally regulated abortion data and potential other strategies for achieving reliable, uniform data. He proposes two avenues for a “comprehensive, uniform abortion data” set: a ‘command’ option that requires states to provide and collect abortion information for a federal database or a ‘bribe’ option that monetarily incentivizes states to provide this information.

Knowles Myers, Caitlin, and Morgan Welch. “What Can Economic Research Tell Us about the Effect of Abortion Access on Women’s Lives?” Brookings, November 30, 2021. https://www.brookings.edu/research/what-can-economic-research-tell-us-about-the-effect-of-abortion-access-on-womens-lives/

  • Knowles Myers and Welch write on what current economic research suggests about abortion access on women’s reproductive, social, and economic outcomes.
  • Comparing Alaska, California, Hawaii, New York, Washington, and the District of Columbia (states which repealed abortion bans prior to Roe) to other states, research suggests states that repealed abortion bans had between a 4–11% decline in births relative to the rest of the country — with effects particularly large for teens and women of color. Studies also suggest that abortion legalization reduced the number of teen mothers by 34% and reduced maternal mortality by 30–40%, with little impact on white women.
  • Additional studies indicate that abortion access has a large impact on the circumstances under which children are born. Various studies find that abortion legalization reduced the number of unwanted children, cases of neglect and abuse, and the number of children living in poverty. It also improved long-term outcomes by increasing the likelihood of child attendance in college.
  • Other studies find that abortion and pregnancy have substantial impact on women’s economic and social lives, with pregnancy frequently lowering women’s wages. This fact has substantial implications for “low-income mothers experiencing disruptive life events.” Based on various studies, the authors argue that “access to abortion could be pivotal to these women’s financial lives.”
  • While abortion is driven by views on women’s bodily autonomy and when life begins, the authors find a clear causal link between access to abortion and “whether, when, and under what circumstances women become mothers.” All studies suggest that access to abortion can have substantial implications on education, earnings, careers, and life outcomes. Restricting or eliminating access would diminish women’s personal and economic lives along with that of their families.

Maxmen, Amy. “Why Hundreds of Scientists Are Weighing in on a High-Stakes US Abortion Case.” Nature 599, no. 7884 (October 26, 2021): 187–89. https://doi.org/10.1038/d41586-021-02834-7

  • A piece by Amy Maxmen for Nature summarizes a recent amicus brief filed by more than 800 scientists and several scientific organizations providing data-driven research into how abortion access is an important aspect of reproductive health.
  • It notes, for example, more than 40 studies suggesting that receiving an abortion does not harm a woman’s mental or physical health but that being denied an abortion can result in negative financial and health outcomes. It also cites a 2019 study of nearly 900 women who “who sought but were unable to get abortions reported higher rates of chronic headaches and joint pain five years later, compared with those who got an abortion,” while a similar 2017 study finds no similar physical or psychological effects.
  • A separate amicus brief submitted to the Court by about 550 public health and reproductive health researchers described how unwanted pregnancies can result in worse health outcomes. It also can disproportionately harm the physical, mental, and economic well-being of Black people according to a separate study.
  • An additional amicus brief filed by economists notes several studies that found that “abortion legalization in the 1970s helped to increase women’s educational attainment, participation in the labor force and earnings — especially for single Black women.”

Myers, Caitlin, and Ladd, Daniel. “Did parental involvement laws grow teeth? The effects of state restrictions on minors’ access to abortion.” Journal of Health Economics, 71, (2020): p.102302. https://papers.ssrn.com/sol3/papers.cfm?abstract_id=3029823

  • A paper by Caitlin Knowles Myers of Germany’s IZA Institute of Labor Economics and Daniel Ladd of the University of California, Irvine compiles data on the location of abortion providers and enforcement of parental involvement laws. The researchers seek to assess the impact of laws requiring parental approval for an abortion have on minors seeking abortions.
  • The paper concludes that parental involvement laws may have contributed to a modest decline in teen births (a 1.4% reduction) during the 1980s and 1990s but a 2.8% increase from 1993 to 2014 in women aged 15 to 18.
  • It further finds that laws with an avoidance distance (the distance minors have to travel to avoid parental involvement and can seek an abortion confidentially) have significant effects. In the 1980s, a parental involvement law with an avoidance distance of 100 miles decreased teen births by 1.48%. A parental involvement law with a 400-mile avoidance distance, about a day’s drive, increases the teen birth rate by 4.3%.

Popinchalk, Anna, Cynthia Beavin, and Jonathan Bearak. “The State of Global Abortion Data: An Overview and Call to Action.” BMJ Sexual & Reproductive Health 48, no. 1 (January 1, 2022): 3–6. https://doi.org/10.1136/bmjsrh-2021-201109.

  • Popinchalf and colleagues at the Guttmacher Institute write in the journal BMJ Sexual & Reproductive Health on the urgent need for data on abortion incidents and access to examine disparities in people’s ability to safely terminate a pregnancy.
  • The authors note that the three sources of data on abortion are official statistics, surveys of women, and scientific studies. However, stigmatization and varying legal access undermine the quality of this data and can lead to substantial under-reporting. Even in high-income countries, there can be significant variation in the frequency with which data is published. This variation in quality and availability exacerbates inequities by limiting the number of experiences that can be studied.
  • The authors argue that data availability and quality of abortion care can be improved by investing in country-level surveys and scientific studies. It also argues for reducing stigma through community and provider messaging as it can hinder the accuracy and completeness of datasets.

Tierney, Katherine I. “Abortion Underreporting in Add Health: Findings and Implications.” Population Research and Policy Review 38, no. 3 (June 1, 2019): 417–28. https://doi.org/10.1007/s11113-019-09511-8

  • Tierney notes that there is substantial evidence that abortion is significantly underreported in the United States, especially among Black women and those in lower socioeconomic classes.
  • She supplements this review with her own evaluation of the abortion data in the National Longitudinal Study of Adolescent to Adult Health (Add Health), finding that the dataset captures only 35% of expected abortions. Examining data from 1994–1995, 1996, 2001–2002, and 2008–2009, she found severe abortion underreporting; however, there were no significant differences between race/ethnicity, age, or time of abortion and underreporting.
  • Tierney argues that this fact means that Add Health is no better than other surveys in collecting abortion data. She also argues that this underreporting, likely caused by stigma, has substantial implications for research and that researchers should be cautious with self-reports of abortion. Figures need to be evaluated, contextualized, and used with caution.

Selected Readings on Digital Self-Determination for Migrants


By Uma Kalkar, Marine Ragnet, and Stefaan Verhulst

Digital self-determination (DSD) is a multidisciplinary concept that extends self-determination to the digital sphere. Self-determination places humans (and their ability to make ‘moral’ decisions) at the center of decision-making actions. While self-determination is considered as a jus cogens rule (i.e. a global norm), the concept of digital self-determination came only to light in the early 2010s as a result of the increasing digitization of most aspects of society. 

While digitalization has opened up new opportunities for self-expression and communication for individuals across the globe, its reach and benefits have not been evenly distributed. For instance, migrants and refugees are particularly vulnerable to the deepening inequalities and power structures brought on by increased digitization, and the subsequent datafication. Further, non-traditional data, such as social media and telecom data, have brought great potential to improve our understanding of the migration experience and patterns of mobility that can provide more targeted migration policies and services yet it also has brought new concerns related to the lack of agency to determine how the data is being used and who determines the migration narrative.

These selected readings look at DSD in light of the growing ubiquity of technology applications and specifically focus on their impacts on migrants. They were produced to inform the first studio on DSD and migration co-hosted by the Big Data for Migration Alliance and the International Digital Self Determination Network. The readings are listed in alphabetical order.

These readings serve as a primer to offer base perspectives on DSD and its manifestations, as well as provide a better understanding of how migration data is managed today to advance or hinder life for those on the move. Please alert us of any other publication we should include moving forward.

Berens, Jos, Nataniel Raymond, Gideon Shimshon, Stefaan Verhulst, and Lucy Bernholz. “The Humanitarian Data Ecosystem: the Case for Collective Responsibility.” Stanford Center for Philanthropy and Civil Society, 2017.

  • The authors explore the challenges to, and potential solutions for, the responsible use of digital data in the context of international humanitarian action. Data governance is related to DSD because it oversees how the information extracted from an individual—understood by DSD as an extension of oneself in the digital sphere—is handled.
  • They argue that in the digital age, the basic service provision activities of NGOs and aid organizations have become data collection processes. However, the ecosystem of actors is “uncoordinated” creating inefficiencies and vulnerabilities in the humanitarian space.
  • The paper presents a new framework for responsible data use in the humanitarian domain. The authors advocate for data users to follow three steps: 
  1. “[L]ook beyond the role they take up in the ‘data-lifecycle’ and consider previous and following steps and roles;
  2. Develop sound data responsibility strategies not only to prevent harm to their own operations but also to other organizations in the ‘data-lifecycle;’ and, 
  3. Collaborate with and learn from other organizations, both in the humanitarian field and beyond, to establish broadly supported guidelines and standards for humanitarian data use.”

Currion, Paul. “The Refugee Identity.Caribou Digital (via Medium), March 13, 2018.

  • Developed as part of a DFID-funded initiative, this essay outlines the Data Requirements for Service Delivery within Refugee Camps project that investigated current data standards and design of refugee identity systems.
  • Currion finds that since “the digitisation of aid has already begun…aid agencies must therefore pay more attention to the way in which identity systems affect the lives and livelihoods of the forcibly displaced, both positively and negatively.” He argues that an interoperable digital identity for refugees is essential to access financial, social, and material resources while on the move but also to tap into IoT services.
  • However, many refugees are wary of digital tracking and data collection services that could further marginalize them as they search for safety. At present, there are no sector-level data standards around refugee identity data collection, combination, and centralization. How can regulators balance data protection with government and NGO requirements to serve refugees in the ways they want to uphold their DSD?
  • Currion argues that a Responsible Data approach, as opposed to a process defined by a Data Minimization principle, provides “useful guidelines” but notes that data responsibility “still needs to be translated into organizational policy, then into institutional processes, and finally into operational practice. He further adds that “the digitization of aid, if approached from a position that empowers the individual as much as the institution, offers a chance to give refugees back their voices.”

Decker, Rianne, Paul Koot, S. Ilker Birbil, Mark van Embden Andres. “Co-designing algorithms for governance: Ensuring responsible and accountable algorithmic management of refugee camp supplies” Big Data and Society, April 2022. 

  • While recent literature has looked at the negative impacts of big data and algorithms in public governance, claiming they may reinforce existing biases and defy scrutiny by public officials, this paper argues that designing algorithms with relevant government and society stakeholders might be a way to make them more accountable and transparent. 
  • It presents a case study of the development of an algorithmic tool to estimate the populations of refugee camps to manage the delivery of emergency supplies. The algorithms included in this tool were co-designed with relevant stakeholders. 
  • This may provide a way to uphold DSD by  contributing to the “accountability of the algorithm by making the estimations transparent and explicable to its users.”
  • The authors found that the co-design process enabled better accuracy and responsibility and fostered collaboration between partners, creating a suitable purpose for the tool and making the algorithm understandable to its users. This enabled algorithmic accountability. 
  • The authors note, however, that the beneficiaries of the tools were not included in the design process, limiting the legitimacy of the initiative. 

European Migration Network. “The Use of Digitalisation and Artificial Intelligence in Migration Management.” EMN-OECD Inform Series, February 2022.

  • This paper explores the role of new digital technologies in the management of migration and asylum, focusing specifically on where digital technologies, such as online portals, blockchain, and AI-powered speech and facial recognition systems are being used across Europe to navigate the processes of obtaining visas, claiming asylum, gaining citizenship,  and deploying border control management. 
  • Further, it points to friction between GDPR and new technologies like blockchain—which by decision does not allow for the right to be forgotten—and potential workarounds, such as two-step pseudonymisation.
  • As well, it highlights steps taken to oversee and open up data protection processes for immigration. Austria, Belgium, and France have begun to conduct Data Protection Impact Assessments; France has a portal that allows one to request the right to be forgotten; Ireland informs online service users on how data can be shared or used with third-party agencies; and Spain outlines which personal data are used in immigration as per the Registry Public Treatment Activities.
  • Lastly, the paper points out next steps for policy development that upholds DSD, including universal access and digital literacy, trust in digital systems, willingness for government digital transformations, and bias and risk reduction.

Martin, Aaron, Gargi Sharma, Siddharth Peter de Souza, Linnet Taylor, Boudewijn van Eerd, Sean Martin McDonald, Massimo Marelli, Margie Cheesman, Stephan Scheel, and Huub Dijstelbloem. “Digitisation and Sovereignty in Humanitarian Space: Technologies, Territories and Tensions.” Geopolitics (2022): 1-36.

  • This paper explores how digitisation and datafication are reshaping sovereign authority, power, and control in humanitarian spaces.
  • Building on the notion that technology is political, Martin et al. discuss three cases where digital tools powered by partnerships between international organizations and NGOs and private firms such as Palantir and Facebook have raised concerns for data to be “repurposed” to undermine national sovereignty and distort humanitarian aims with for-profit motivations.
  • The authors draw attention to how cyber dependencies threaten international humanitarian organizations’ purported digital sovereignty. They touch on the tensions between national and digital sovereignty and self-governance.
  • The paper further argues that the rise of digital technologies in the governance of international mobility and migration policies “has all kinds of humanitarian and security consequences,” including (but not limited to) surveillance, privacy infringement, profiling, selection, inclusion/exclusion, and access barriers. Specifically, Scheel introduces the notion of function creep—the use of digital data beyond initially defined purposes—and emphasizes its common use in the context of migration as part “of the modus operandi of sovereign power.”

McAuliffe, Marie, Jenna Blower, and Ana Beduschi. “Digitalization and Artificial Intelligence in Migration and Mobility: Transnational Implications of the COVID-19 Pandemic.” Societies 11, no. 135 (2021): 1-13.

  • This paper critically examines the implications of intensifying digitalization and AI for migration and mobility systems in a post- COVID transnational context. 
  • The authors first situate digitalization and AI in migration by analyzing its uptake throughout the Migration Cycle, i.e. to verify identities and visas, “enable “smart” border processing,” and understand travelers’ adherence to legal frameworks. It then evaluates the current challenges and opportunities to migrants and migration systems brought about by deepening digitalization due to COVID-19. For example, contact tracing, infection screening, and quarantining procedures generate increased data about an individual and are meant, by design, to track and trace people, which raises concerns about migrants’ safety, privacy, and autonomy.
  • This essay argues that recent changes show the need for further computational advances that incorporate human rights throughout the design and development stages, “to mitigate potential risks to migrants’ human rights.” AI is severely flawed when it comes to decision-making around minority groups because of biased training data and could further marginalize vulnerable populations and intrusive data collection for public health could erode the power of one’s universal right to privacy. Leaving migrants at the mercy of black-box AI systems fails to uphold their right to DSD because it forces them to relinquish their agency and power to an opaque system.

Ponzanesi, Sandra. “Migration and Mobility in a Digital Age: (Re)Mapping Connectivity and Belonging.” Television & New Media 20, no. 6 (2019): 547-557.

  • This article explores the role of new media technologies in rethinking the dynamics of migration and globalization by focusing on the role of migrant users as “connected” and active participants, as well as “screened” and subject to biometric datafication, visualization, and surveillance.
  • Elaborating on concepts such as “migration” and “mobility,” the article analyzes the paradoxes of intermittent connectivity and troubled belonging, which are seen as relational definitions that are always fluid, negotiable, and porous.
  • It states that a city’s digital infrastructures are “complex sociotechnical systems” that have a functional side related to access and connectivity and a performative side where people engage with technology. Digital access and action represent areas of individual and collective manifestations of DSD. For migrants, gaining digital access and skills and “enacting citizenship” are important for resettlement. Ponzanesi advocates for further research conducted both from the bottom-up that leans on migrant experiences with technology to resettle and remain in contact with their homeland and a top-down approach that looks at datafication, surveillance, digital/e-governance as a part of the larger technology application ecosystem to understand contemporary processes and problems of migration.

Remolina, Nydia, and Mark James Findlay. “The Paths to Digital Self-Determination — A Foundational Theoretical Framework.” SMU Centre for AI & Data Governance Research Paper No. 03 (2021): 1-34.

  • Remolina and Findlay stress that self-determination is the vehicle by which people “decide their own destiny in the international order.” Decision-making ability powers humans to be in control of their own lives and excited to pursue a set of actions. Collective action, or the ability to make decisions as a part of a group—be it based on ethnicity, nationality, shared viewpoints, etc.—further motivates oneself.
  • The authors discuss how the European Union and European Court of Human Rights’ “principle of subsidiarity” aligns with self-determination because it advocates for power to be placed at the lowest level possible to preserve bottom-up agency with a “reasonable level of efficiency.” In practice, the results of subsidiarity have been disappointing.
  • The paper provides examples of indigenous populations’ fight for self-determination, offline and online. Here, digital self-determination refers to the challenges indigenous peoples face in accessing growing government uses of technology for unlocking innovative solutions because of a lack of physical infrastructure due to structural and social inequities between settler and indigenous communities.
  • Understanding self-determination—and by extension, digital self-determination as a human right, the report investigates how autonomy, sovereignty, the legal definition of a ‘right,’ inclusion, agency, data governance, data ownership, data control, and data quality.
  • Lastly, the paper presents a foundational theoretical framework that goes beyond just protecting personal data and privacy. Understanding that DSD “cannot be detached from duties for responsible data use,” the authors present a collective and individual dimension to DSD. They extend the individual dimension of DSD to include both my data and data about me that can be used to influence a person’s actions through micro-targeting and nudge techniques. They update the collective dimension of DSD to include the views and influences of organizations, businesses, and communities online and call for a better way of visualizing the ‘social self’ and its control over data.

Ziebart, Astrid, and Jessica Bither. “AI, Digital Identities, Biometrics, Blockchain: A Primer on the Use of Technology in Migration Management.” Migration Strategy Group on International Cooperation and Development, June 2020.

  • Ziebart and Bither note the implications of increasingly sophisticated use of technology and data collection by governments with respect to their citizens. They note that migrants and refugees “often are exposed to particular vulnerabilities” during these processes and underscore the need to bring migrants into data gathering and use policy conversations.  
  • The authors discuss the promise of technology—i.e., to predict migration through AI-powered analyses, employ technologies to reduce friction in the asylum-seeking processes, and the power of digital identities for those on the move. However, they stress the need to combine these tools with informational self-determination that allows migrants to own and control what data they share and how and where the data are used.
  • The migration and refugee policy space faces issues of “tech evangelism,” where technologies are being employed just because they exist, rather than because they serve an actual policy need or provide an answer to a particular policy question. This supply-driven policy implementation signals the need for more migrant voices to inform policymakers on what tools are actually useful for the migratory experience. In order to advance the digital agency of migrants, the paper offers recommendations for some of the ethical challenges these technologies might pose and ultimately advocates for greater participation of migrants and refugees in devising technology-driven policy instruments for migration issues.

On-the-go interesting resources 

  • Empowering Digital Self-Determination, mediaX at Stanford University: This short video presents definitions of DSD, and digital personhood, identity, and privacy and an overview of their applications across ethics, law, and the private sector.
  • Digital Self-Determination — A Living Syllabus: This syllabus and assorted materials have been created and curated from the 2021 Research Sprint run by the Digital Asia Hub and Berkman Klein Center for Internet Society at Harvard University. It introduces learners to the fundamentals of DSD across a variety of industries to enrich understanding of its existing and potential applications.
  • Digital Self-Determination Wikipedia Page: This Wikipedia page was developed by the students who took part in the Berkman Klein Center research sprint on digital self-determination. It provides a comprehensive overview of DSD definitions and its key elements, which include human-centered design, robust privacy mandates and data governance, and control over data use to give data subjects the ability to choose how algorithms manipulate their data for autonomous decision-making.
  • Roger Dubach on Digital Self-Determination: This short video presents DSD in the public sector and the dangers of creating a ‘data-protected’ world, but rather on understanding how governments can efficiently use data and protect privacy. Note: this video is part of the Living Syllabus course materials (Digital Self-Determination/Module 1: Beginning Inquiries).

Updated Selected Readings on Inaccurate Data, Half-Truths, Disinformation, and Mob Violence


By Fiona Cece, Uma Kalkar, Stefaan Verhulst, and Andrew J. Zahuranec

As part of an ongoing effort to contribute to current topics in data, technology, and governance, The GovLab’s Selected Readings series provides an annotated and curated collection of recommended works on themes such as open data, data collaboration, and civic technology.

In this edition, we reflect on the one-year anniversary of the January 6, 2021 Capitol Hill Insurrection and its implications of disinformation and data misuse to support malicious objectives. This selected reading builds on the previous edition, published last year, on misinformation’s effect on violence and riots. Readings are listed in alphabetical order. New additions are highlighted in green. 

The mob attack on the US Congress was alarming and the result of various efforts to undermine the trust in and legitimacy of longstanding democratic processes and institutions. The use of inaccurate data, half-truths, and disinformation to spread hate and division is considered a key driver behind last year’s attack. Altering data to support conspiracy theories or challenging and undermining the credibility of trusted data sources to allow for alternative narratives to flourish, if left unchallenged, has consequences — including the increased acceptance and use of violence both offline and online.

The January 6th insurrection was unfortunately not a unique event, nor was it contained to the United States. While efforts to bring perpetrators of the attack to justice have been fruitful, much work remains to be done to address the willful dissemination of disinformation online. Below, we provide a curation of findings and readings that illustrate the global danger of inaccurate data, half-truths, and disinformation. As well, The GovLab, in partnership with the OECD, has explored data-actionable questions around how disinformation can spread across and affect society, and ways to mitigate it. Learn more at disinformation.the100questions.org.

To suggest additional readings on this or any other topic, please email info@thelivinglib.org. All our Selected Readings can be found here.

Readings and Annotations

Al-Zaman, Md. Sayeed. “Digital Disinformation and Communalism in Bangladesh.” China Media Research 15, no. 2 (2019): 68–76.

  • Md. Sayeed Al-Zaman, Lecturer at Jahangirnagar University in Bangladesh, discusses how the country’s increasing number of “netizens” are being manipulated by online disinformation and inciting violence along religious lines. Social media helps quickly spread Anti-Hindu and Buddhist rhetoric, inflaming religious divisions between these groups and Bangladesh’s Muslim majority, impeding possibilities for “peaceful coexistence.”
  • Swaths of online information make it difficult to fact-check, and alluring stories that feed on people’s fear and anxieties are highly likely to be disseminated, leading to a spread of rumors across Bangladesh. Moreover, disruptors and politicians wield religion to target citizens’ emotionality and create violence.
  • Al-Zaman recounts two instances of digital disinformation and communalism. First, in 2016, following a Facebook post supposedly criticizing Islam, riots destroyed 17 templates and 100 houses in Nasrinagar and led to protests in neighboring villages. While the exact source of the disinformation post was never confirmed, a man was beaten and jailed for it despite robust evidence of his wrongdoing. Second, in 2012, after a Facebook post circulated an image of someone desecrating the Quran tagged a Buddhist youth in the picture, 12 Buddhist monasteries and 100 houses in Ramu were destroyed. Through social media, a mob of over 6,000 people, including local Muslim community leaders, attacked the town of Ramu. Later investigation found that the image had been doctored and spread by an Islamic extremist group member in a coordinated attack, manipulating Islamic religious sentiment via fake news to target Buddhist minorities.

Banaji, Shakuntala, and Ram Bhat. “WhatsApp Vigilantes: An exploration of citizen reception and circulation of WhatsApp misinformation linked to mob violence in India.” London School of Economics and Political Science, 2019.

  • London School of Economics and Political Science Associate Professor Shakuntala Banaji and Researcher Ram Bhat articulate how discriminated groups (Dalits, Muslims, Christians, and Adivasis) have been targeted by peer-to-peer communications spreading allegations of bovine related issues, child-snatching, and organ harvesting, culminating in violence against these groups with fatal consequences.
  • WhatsApp messages work in tandem with ideas, tropes, messages, and stereotypes already in the public domain, providing “verification” of fake news.
  • WhatsApp use is gendered, and users are predisposed to believe misinformation and spread misinformation, particularly if it targets a discriminated group that they already have negative and discriminatory feelings towards.
  • Among most WhatsApp users, civic trust is based on ideological, family, and community ties.
  • Restricting sharing, tracking, and reporting of misinformation using “beacon” features and imposing penalties on groups can serve to mitigate the harmful effects of fake news.

Funke, Daniel, and Susan Benkelman. “Misinformation is inciting violence around the world. And tech platforms don’t seem to have a plan to stop it.” Poynter, April 4, 2019.

  • Misinformation leading to violence has been on the rise worldwide. PolitiFact writer Daniel Funke and Susan Benkelman, former Director of Accountability Journalism at the American Press Institute, point to mob violence against Romas in France after rumors of kidnapping attempts circulated on Facebook and Snapchat; the immolation of two men in Puebla, Mexico following fake news spread on Whatsapp of a gang of organ harvesters on the prowl; and false kidnapping claims sent through Whatsapp fueling lynch mobs in India.
  • Slow (re)action to fake news allows mis/disinformation to prey on vulnerable people and infiltrate society. Examples covered in the article discuss how fake news preys on older Americans who lack strong digital literacy. Virulent online rumors have made it difficult for citizens to separate fact from fiction during the Indian general election. Foreign adversaries like Russia are bribing Facebook users for their accounts in order to spread false political news in Ukraine.
  • The article notes that increases in violence caused by disinformation are doubly enabled by “a lack of proper law enforcement” and inaction by technology companies. Facebook, Youtube, and Whatsapp have no coordinated, comprehensive plans to fight fake news and attempt to shift responsibility to “fact-checking partners.” Troublingly, it appears that some platforms deliberately delay the removal of mis/disinformation to attract more engagement. Only once facing intense pressure from policymakers does it seem that these companies remove misleading information.

Kyaw, Nyi Nyi. “Facebooking in Myanmar: From Hate Speech to Fake News to Partisan Political Communication.” ISEAS — Yusof Ishak Institute, no. 36 (2019): 1–10.

  • In the past decade, the number of plugged-in Myanmar citizens has skyrocketed to 39% of the population. All of these 21 million internet users are active on Facebook, where much political rhetoric occurs. Widespread fake news disseminated through Facebook has led to an increase in anti-Muslim sentiment and the spread of misleading, inflammatory headlines.
  • Attempts to curtail fake news on Facebook are difficult. In Myanmar, a developing country where “the rule of law is weak,” monitoring and regulation on social media is not easily enforceable. Criticism from Myanmar and international governments and civil society organizations resulted in Facebook banning and suspending fake news accounts and pages and employing stricter, more invasive monitoring of citizen Facebook use — usually without their knowledge. However, despite Facebook’s key role in agitating and spreading fake news, no political or oversight bodies have “explicitly held the company accountable.”
  • Nyi Nyi Kyaw, Visiting Fellow at the Yusof Ishak Institute in Singapore, notes a cyber law initiative set in motion by the Myanmar government to strengthen social media monitoring methods but is wary of Myanmar’s “human and technological capacity” to enforce these regulations.

Lewandowsky, Stephan, & Sander van der Linden. “Countering Misinformation and Fake News Through Inoculation and Prebunking.” European Review of Social Psychology 32, no. 2, (2020): 348-384.

  • Researchers Stephan Lewandowsky and Sander van der Linden present a scan of conventional instances and tools to combat misinformation. They note the staying power and spread of sensational sound bites, especially in the political arena, and their real-life consequences on problems such as anti-vaccination campaigns, ethnically-charged violence in Myanmar, and mob lynchings in India spurred by Whatsapp rumors.
  • To proactively stop misinformation, the authors introduce the psychological theory of “inoculation,” which forewarns people that they have been exposed to misinformation and alerts them to the ways by which they could be misled to make them more resilient to false information. The paper highlights numerous successes of inoculation in combating misinformation and presents it as a strategy to prevent disinformation-fueled violence.
  • The authors then discuss best strategies to deploy fake news inoculation and generate “herd” cognitive immunity in the face of microtargeting and filter bubbles online.

Osmundsen, Mathias, Alexander Bor, Peter Bjerregaard Vahlstrup, Anja Bechmann, and Michael Bang Petersen. “Partisan polarization is the primary psychological motivation behind “fake news” sharing on Twitter.” American Political Science Review, 115, no.3, (2020): 999-1015.

  • Mathias Osmundsen and colleagues explore the proliferation of fake news on digital platforms. Are those who share fake news “ignorant and lazy,” malicious actors, or playing political games online? Through a psychological mapping of over 2,000 Twitter users across 500,000 stories, the authors find that disruption and polarization fuel fake news dissemination more so than ignorance.
  • Given the increasingly polarized American landscape, spreading fake news can help spread “partisan feelings,” increase interparty social and political cohesion, and call supporters to incideniary and violent action. Thus, misinformation prioritizes usefulness to reach end goals over accuracy and veracity of information.
  • Overall, the authors find that those with low political awareness and media literacy are the least likely to share fake news. While older individuals were more likely to share fake news, the inability to identify real versus fake information was not a major contributor of motivating the spread of misinformation. 
  • For the most part, those who share fake news are knowledgeable about the political sphere and online spaces. They are primarily motivated to ‘troll’ or create online disruption, or to further their partisan stance. In the United States, right-leaning individuals are more likely to follow fake news because they “must turn to more extreme news sources” to find information aligned with their politics, while left-leaning people can find more credible sources from liberal and centrist outlets.

Piazza, James A. “Fake news: the effects of social media disinformation on domestic terrorism.” Dynamics of Asymmetric Conflict (2021): 1-23.

  • James A. Piazza of Pennsylvania State University examines the role of online misinformation in driving distrust, political extremism, and political violence. He reviews some of the ongoing literature on online misinformation and disinformation in driving these and other adverse outcomes.
  • Using data on incidents of terrorism from the Global Terrorism Database and three independent measures of disinformation derived from the Digital Society Project, Piazza finds “disinformation propagated through online social media outlets is statistically associated with increases in domestic terrorism in affected countries. The impact of disinformation on terrorism is mediated, significantly and substantially, through increased political polarization.”
  • Piazza notes that his results support other literature that shows the real-world effects of online disinformation. He emphasizes the need for further research and investigation to better understand the issue.

Posetti, Julie, Nermine Aboulez, Kalina Bontcheva, Jackie Harrison, and Silvio Waisbord. “Online violence Against Women Journalists: A Global Snapshot of Incidence and Impacts.” United Nations Educational, Scientific and Cultural Organization, 2020.

  • The survey focuses on incidence, impacts, and responses to online violence against women journalists that are a result of “coordinated disinformation campaigns leveraging misogyny and other forms of hate speech. There were 901 respondents, hailing from 125 countries, and covering various ethnicities.
  • 73% of women journalists reported facing online violence and harassment in the course of their work, suggesting escalating gendered violence against women in online media.
  • The impact of COVID-19 and populist politics is evident in the gender-based harassment and disinformation campaigns, the source of which is traced to political actors (37%) or anonymous/troll accounts (57%).
  • Investigative reporting on gender issues, politics and elections, immigration and human rights abuses, or fake news itself seems to attract online retaliation and targeted disinformation campaigns against the reporters.

Rajeshwari, Rema. “Mob Lynching and Social Media.” Yale Journal of International Affairs, June 1, 2019.

  • District Police Chief of Jogulamba Gadwal, India, and Yale World Fellow (’17) Rema Rajeshwari writes about how misinformation and disinformation are becoming a growing problem and security threat in India. The fake news phenomenon has spread hatred, fueled sectarian tensions, and continues to diminish social trust in society.
  • One example of this can be found in Jogulamba Gadwal, where videos and rumors were spread throughout social media about how the Parthis, a stigmatized tribal group, were committing acts of violence in the village. This led to a series of mob attacks and killings — “thirty-three people were killed in sixty-nine mob attacks since January 2018 due to rumors” — that could be traced to rumors spread on social media.
  • More importantly, however, Rajeshwari elaborates on how self-regulation and local campaigns can be used as an effective intervention for mis/dis-information. As a police officer, Rajeshwari fought a battle that was both online and on the ground, including the formation of a group of “tech-savvy” cops who could monitor local social media content and flag inaccurate and/or malicious posts, and mobilizing local WhatsApp groups alongside village headmen who could encourage community members to not forward fake messages. These interventions effectively combined local traditions and technology to achieve an “early warning-focused deterrence.”

Taylor, Luke. “Covid-19 Misinformation Sparks Threats and Violence against Doctors in Latin America.” BMJ (2020): m3088.

  • Journalist Luke Taylor details the many incidents of how disinformation campaigns across Latin America have resulted in the mistreatment of health care workers during the Coronavirus pandemic. Examining case studies from Mexico and Colombia, Taylor finds that these mis/disinformation campaigns have resulted in health workers receiving death threats and being subject to acts of aggression.
  • One instance of this link between disinformation and acts of aggression are the 47 reported cases of aggression towards health workers in Mexico and 265 reported complaints against health workers as well. The National Council to Prevent Discrimination noted these acts were the result of a loss of trust in government and government institutions, which was further exacerbated by conspiracy theories that circulated WhatsApp and other social media channels.
  • Another example of false narratives can be seen in Colombia, where a politician theorized that a “covid cartel” of doctors were admitting COVID-19 patients to ICUs in order to receive payments (e.g., a cash payment of ~17,000 Columbian pesos for every dead patient with a covid-19 diagnosis). This false narrative of doctors being incentivized to increase beds for COVID-19 patients quickly spread across social media platforms, resulting in many of those who were ill to avoid seeking care. This rumor also led to doctors in Colombia receiving death threats and intimidation acts.

“The Danger of Fake News in Inflaming or Suppressing Social Conflict.” Center for Information Technology and Society — University of California Santa Barbara, n.d.

  • The article provides case studies of how fake news can be used to intensify social conflict for political gains (e.g., by distracting citizens from having a conversation about critical issues and undermining the democratic process).
  • The cases elaborated upon are 1) Pizzagate: a fake news story that linked human trafficking to a presidential candidate and a political party, and ultimately led to a shooting; 2) Russia’s Internet Research Agency: Russian agents created social media accounts to spread fake news that favored Donald Trump during the 2016 election, and even instigated online protests about social issues (e.g., a BLM protest); and 3) Cambridge Analytica: a British company that used unauthorized social media data for sensationalistic and inflammatory targeted US political advertisements.
  • Notably, it points out that fake news undermines a citizen’s ability to participate in the democratic process and make accurate decisions in important elections.

Tworek, Heidi. “Disinformation: It’s History.” Center for International Governance Innovation, July 14, 2021.

  • While some public narratives frame online disinformation and its influence on real-world violence as “unprecedented and unparalleled” to occurrences in the past. Professor Heidi Tworek of the University of British Columbia points out that “assumptions about the history of disinformation” have (and continue to) influence policymaking to combat fake news. She argues that today’s unprecedented events are rooted in tactics similar to those of the past, such as how Finnish policymakers invested in national communications strategy to fight foreign disinformation coming from Russia and the Soviet Union.
  • She emphasizes the power of learning from historical events to guide modern methods of fighting political misinformation. Connecting today’s concerns of election fraud, foreign interference, and conspiracy theories to those of the past, such as “funding magazines [and] spreading rumors” on Soviet and American practices during the Cold War to further anti-opposition sentiment and hatred reinforces that disinformation is a long-standing problem.

Ward, Megan, and Jessica Beyer. “Vulnerable Landscapes: Case Studies of Violence and Disinformation” Wilson Center, August 2019.

  • This article discusses instances where disinformation inflamed already existing social, political, and ideological cleavages, and ultimately caused violence. Specifically, it elaborates on instances from the US-Mexico border, India, Sri Lanka, and during the course of three Latin American elections.
  • Though the cases are meant to be illustrative and highlight the spread of disinformation globally, the violence in these cases was shown to be affected by the distinct social fabric of each place. Their findings lend credence to the idea that disinformation helped spark violence in places that were already vulnerable and tense.
  • Indeed, now that disinformation can be so quickly distributed using social media, coupled with declining trust in public institutions, low levels of media literacy, meager actions taken by social media companies, and government actors who exploit disinformation for political gain, there has been a rise of these cases globally. It is an interaction of factors such as distrust in traditional media and public institutions, lack of content moderation on social media, and ethnic divides that render societies vulnerable and susceptible to violence.
  • One example of this is at the US/Mexico border, where disinformation campaigns have built on pre-existing xenophobia, and have led to instances of mob-violence and mass shootings. Inflamed by disinformation campaigns that migrant caravans contain criminals (e.g., invasion narratives often used to describe migrant caravans), the armed group United Constitutional Patriots (UCP) impersonated law enforcement and detained migrants at the US border, often turning them over to border officials. UCP has since been arrested by the FBI for impersonating law enforcement.

We welcome other sources we may have missed — please share any suggested additions with us at datastewards [at] thegovlab.org or The GovLab on Twitter.

Selected Readings on the Use of Artificial Intelligence in the Public Sector


By Kateryna Gazaryan and Uma Kalkar

The Living Library’s Selected Readings series seeks to build a knowledge base on innovative approaches for improving the effectiveness and legitimacy of governance. This curated and annotated collection of recommended works focuses on algorithms and artificial intelligence in the public sector.

As Artificial Intelligence becomes more developed, governments have turned to it to improve the speed and quality of public sector service delivery, among other objectives. Below, we provide a selection of recent literature that examines how the public sector has adopted AI to serve constituents and solve public problems. While the use of AI in governments can cut down costs and administrative work, these technologies are often early in development and difficult for organizations to understand and control with potential harmful effects as a result. As such, this selected reading explores not only the use of artificial intelligence in governance but also its benefits, and its consequences.

Readings are listed in alphabetical order.

Berryhill, Jamie, Kévin Kok Heang, Rob Clogher, and Keegan McBride. “Hello, World: Artificial intelligence and its use in the public sector.OECD Working Papers on Public Governance no. 36 (2019): https://doi.org/10.1787/726fd39d-en.

This working paper emphasizes the importance of defining AI for the public sector and outlining use cases of AI within governments. It provides a map of 50 countries that have implemented or set in motion the development of AI strategies and highlights where and how these initiatives are cross-cutting, innovative, and dynamic. Additionally, the piece provides policy recommendations governments should consider when exploring public AI strategies to adopt holistic and humanistic approaches.

Kuziemski, Maciej, and Gianluca Misuraca. “AI Governance in the Public Sector: Three Tales from the Frontiers of Automated Decision-Making in Democratic Settings.” Telecommunications Policy 44, no. 6 (2020): 101976. 

Kuziemski and Misuraca explore how the use of artificial intelligence in the public sector can exacerbate existing power imbalances between the public and the government. They consider the European Union’s artificial intelligence “governance and regulatory frameworks” and compare these policies with those of Canada, Finland, and Poland. Drawing on previous scholarship, the authors outline the goals, drivers, barriers, and risks of incorporating artificial intelligence into public services and assess existing regulations against these factors. Ultimately, they find that the “current AI policy debate is heavily skewed towards voluntary standards and self-governance” while minimizing the influence of power dynamics between governments and constituents. 

Misuraca, Gianluca, and Colin van Noordt. “AI Watch, Artificial Intelligence in Public Services: Overview of the Use and Impact of AI in Public Services in the EU.” 30255 (2020).

This study provides “evidence-based scientific support” for the European Commission as it navigates AI regulation via an overview of ways in which European Union member-states use AI to enhance their public sector operations. While AI has the potential to positively disrupt existing policies and functionalities, this report finds gaps in how AI gets applied by governments. It suggests the need for further research centered on the humanistic, ethical, and social ramification of AI use and a rigorous risk assessment from a “public-value perspective” when implementing AI technologies. Additionally, efforts must be made to empower all European countries to adopt responsible and coherent AI policies and techniques.

Saldanha, Douglas Morgan Fullin, and Marcela Barbosa da Silva. “Transparency and Accountability of Government Algorithms: The Case of the Brazilian Electronic Voting System.” Cadernos EBAPE.BR 18 (2020): 697–712.

Saldanha and da Silva note that open data and open government revolutions have increased citizen demand for algorithmic transparency. Algorithms are increasingly used by governments to speed up processes and reduce costs, but their black-box  systems and lack of explanability allows them to insert implicit and explicit bias and discrimination into their calculations. The authors conduct a qualitative study of the “practices and characteristics of the transparency and accountability” in the Brazilian e-voting system across seven dimensions: consciousness; access and reparations; accountability; explanation; data origin, privacy and justice; auditing; and validation, precision and tests. They find the Brazilian e-voting system fulfilled the need to inform citizens about the benefits and consequences of data collection and algorithm use but severely lacked in demonstrating accountability and opening algorithm processes for citizen oversight. They put forth policy recommendations to increase the e-voting system’s accountability to Brazilians and strengthen auditing and oversight processes to reduce the current distrust in the system.

Sharma, Gagan Deep, Anshita Yadav, and Ritika Chopra. “Artificial intelligence and effective governance: A review, critique and research agenda.Sustainable Futures 2 (2020): 100004.

This paper conducts a systematic review of the literature of how AI is used across different branches of government, specifically, healthcare, information, communication, and technology, environment, transportation, policy making, and economic sectors. Across the 74 papers surveyed, the authors find a gap in the research on selecting and implementing AI technologies, as well as their monitoring and evaluation. They call on future research to assess the impact of AI pre- and post-adoption in governance, along with the risks and challenges associated with the technology.

Tallerås, Kim, Terje Colbjørnsen, Knut Oterholm, and Håkon Larsen. “Cultural Policies, Social Missions, Algorithms and Discretion: What Should Public Service Institutions Recommend?Part of the Lecture Notes in Computer Science book series (2020).

Tallerås et al. examine how the use of algorithms by public services, such as public radio and libraries, influence broader society and culture. For instance, to modernize their offerings, Norway’s broadcasting corporation (NRK) has adopted online platforms similar to popular private streaming services. However, NRK’s filtering process has faced “exposure diversity” problems that narrow recommendations to already popular entertainment and move Norway’s cultural offerings towards a singularity. As a public institution, NRK is required to “fulfill […] some cultural policy goals,” raising the question of how public media services can remain relevant in the era of algorithms fed by “individualized digital culture.” Efforts are currently underway to employ recommendation systems that balance cultural diversity with personalized content relevance that engage individuals and uphold the socio-cultural mission of public media.

Vogl, Thomas, Seidelin Cathrine, Bharath Ganesh, and Jonathan Bright. “Smart Technology and the Emergence of Algorithmic Bureaucracy: Artificial Intelligence in UK Local Authorities.” Public administration review 80, no. 6 (2020): 946–961.

Local governments are using “smart technologies” to create more efficient and effective public service delivery. These tools are twofold: not only do they help the public interact with local authorities, they also streamline the tasks of government officials. To better understand the digitization of local government, the authors conducted surveys, desk research, and in-depth interviews with stakeholders from local British governments to understand reasoning, processes, and experiences within a changing government framework. Vogl et al. found an increase in “algorithmic bureaucracy” at the local level to reduce administrative tasks for government employees, generate feedback loops, and use data to enhance services. While the shift toward digital local government demonstrates initiatives to utilize emerging technology for public good, further research is required to determine which demographics are not involved in the design and implementation of smart technology services and how to identify and include these audiences.

Wirtz, Bernd W., Jan C. Weyerer, and Carolin Geyer. “Artificial intelligence and the public sector—Applications and challenges.International Journal of Public Administration 42, no. 7 (2019): 596-615.

The authors provide an extensive review of the existing literature on AI uses and challenges in the public sector to identify the gaps in current applications. The developing nature of AI in public service has led to differing definitions of what constitutes AI and what are the risks and benefits it poses to the public. As well, the authors note the lack of focus on the downfalls of AI in governance, with studies tending to primarily focus on the positive aspects of the technology. From this qualitative analysis, the researchers highlight ten AI applications: knowledge management, process automation, virtual agents, predictive analytics and data visualization, identity analytics, autonomous systems, recommendation systems, digital assistants, speech analytics, and threat intelligence. As well, they note four challenge dimensions—technology implementation, laws and regulation, ethics, and society. From these applications and risks, Wirtz et al. provide a “checklist for public managers” to make informed decisions on how to integrate AI into their operations. 

Wirtz, Bernd W., Jan C. Weyerer, and Benjamin J. Sturm. “The dark sides of artificial intelligence: An integrated AI governance framework for public administration.International Journal of Public Administration 43, no. 9 (2020): 818-829.

As AI is increasingly popularized and picked up by governments, Wirtz et al. highlight the lack of research on the challenges and risks—specifically, privacy and security—associated with implementing AI systems in the public sector. After assessing existing literature and uncovering gaps in the main governance frameworks, the authors outline the three areas of challenges of public AI: law and regulations, society, and ethics. Last, they propose an “integrated AI governance framework” that takes into account the risks of AI for a more holistic “big picture” approach to AI in the public sector.

Zuiderwijk, Anneke, Yu-Che Chen, and Fadi Salem. “Implications of the use of artificial intelligence in public governance: A systematic literature review and a research agenda.Government Information Quarterly (2021): 101577.

Following a literature review on the risks and possibilities of AI in the public sector, Zuiderwijk, Chen, and Salem design a research agenda centered around the “implications of the use of AI for public governance.” The authors provide eight process recommendations, including: avoiding superficial buzzwords in research; conducting domain- and locality-specific research on AI in governance; shifting from qualitative analysis to diverse research methods; applying private sector “practice-driven research” to public sector study; furthering quantitative research on AI use by governments; creating “explanatory research designs”; sharing data for broader study; and adopting multidisciplinary reference theories. Further, they note the need for scholarship to delve into best practices, risk management, stakeholder communication, multisector use, and impact assessments of AI in the public sector to help decision-makers make informed decisions on the introduction, implementation, and oversight of AI in the public sector.

Selected Readings on Data, Gender, and Mobility


By Michelle Winowatan, Uma Kalkar, Andrew Young, and Stefaan Verhulst

The Living Library’s Selected Readings series seeks to build a knowledge base on innovative approaches for improving the effectiveness and legitimacy of governance. This curated and annotated collection of recommended works on the topic of data, gender, and mobility was originally published in 2017, and updated in 2021.

This edition of the Selected Readings was  developed as part of an ongoing project at the GovLab, supported by Data2X, in collaboration with UNICEF, DigitalGlobe, IDS (UDD/Telefonica R&D), and the ISI Foundation, to establish a data collaborative to analyze unequal access to urban transportation for women and girls in Chile. We thank all our partners for their suggestions to the below curation – in particular Leo Ferres at IDS who got us started with this collection; Ciro Cattuto and Michele Tizzoni from the ISI Foundation; and Bapu Vaitla at Data2X for their pointers to the growing data and mobility literature. 

Introduction

Daily mobility is key for gender equity. Access to transportation contributes to women’s agency and independence. The ability to move from place to place safely and efficiently can allow women to access education, work, and the public domain more generally. Yet, mobility is not just a means to access various opportunities. It is also a means to enter the public domain.

Women’s mobility is a multi-layered challenge

Women’s daily mobility, however, is often hampered by social, cultural, infrastructural, and technical barriers. Cultural bias, for instance, limits women’s mobility in a way that women are confined to an area with close proximity to their house due to society’s double standard on women to be homemakers. From an infrastructural perspective, public transportation mostly only accommodates home-to-work trips, when in reality women often make more complex trips with multiple stops, for example, at the market, school, healthcare provider – sometimes called “trip chaining.” From a safety perspective, women tend to avoid making trips in certain areas and/or at certain times due to a constant risk of being sexually harassed n public places. Women are also pushed toward more expensive transportation – such as taking a cab instead of a bus or train – based on safety concerns.

The growing importance of (new sources of) data

Researchers are increasingly experimenting with ways to address these interdependent problems through the analysis of diverse datasets, often collected by private sector businesses and other non-governmental entities. Gender-disaggregated mobile phone records, geospatial data, satellite imagery, and social media data, to name a few, are providing evidence-based insight into gender and mobility concerns. Such data collaboratives – the exchange of data across sectors to create public value – can help governments, international organizations, and other public sector entities in the move toward more inclusive urban and transportation planning, and the promotion of gender equity.

The below curated set of readings seek to focus on the following areas:

  1. Insights on how data can inform gender empowerment initiatives,
  2. Emergent research into the capacity of new data sources – like call detail records (CDRs) and satellite imagery – to increase our understanding of human mobility patterns, and,
  3. Publications exploring data-driven policy for gender equity in mobility.

Readings are listed in alphabetical order.

We selected the readings based upon their focus (gender and/or mobility related); scope and representativeness (going beyond one project or context); type of data used (such as CDRs and satellite imagery); and date of publication.

Annotated Reading List

Data and Gender

Blumenstock, Joshua, and Nathan Eagle. Mobile Divides: Gender, Socioeconomic Status, and Mobile Phone Use in Rwanda. ACM Press, 2010.

  • Using traditional survey and mobile phone operator data, this study analyzes gender and socioeconomic divides in mobile phone use in Rwanda, where it is found that the use of mobile phones is significantly more prevalent in men and the higher class.
  • The study also shows the differences in the way men and women use phones, for example: women are more likely to use a shared phone than men.
  • The authors frame their findings around gender and economic inequality in the country to the end of providing pointers for government action.

Bosco, Claudio, et al. Mapping Indicators of Female Welfare at High Spatial Resolution. WorldPop and Flowminder, 2015.

  • This report focuses on early adolescence in girls, which often comes with higher risk of violence, fewer economic opportunity, and restrictions on mobility. Significant data gaps, methodological and ethical issues surrounding data collection for girls also create barriers for policymakers to create evidence-based policy to address those issues.
  • The authors analyze geolocated household survey data, using statistical models and validation techniques, and creates high-resolution maps of various sex-disaggregated indicators, such as nutrition level, access to contraception, and literacy, to better inform local policy making processes.
  • Further, it identifies the gender data gap and issues surrounding gender data collection, and provides arguments for why having  comprehensive data can help create better policy and contribute to the achievements of the Sustainable Development Goals (SDGs).

Buvinic, Mayra, Rebecca Furst-Nichols, and Gayatri Koolwal. Mapping Gender Data Gaps. Data2X, 2014.

  • This study identifies gaps in gender data in developing countries on health, education, economic opportunities, political participation, and human security issues.
  • It recommends ways to close the gender data gap through censuses and micro-level surveys, service and administrative records, and emphasizes how “big data” in particular can fill the missing data that will be able to measure the progress of women and girls well being. The authors argue that identifying these gaps is key to achieving SDG 5: advancing gender equality and women’s empowerment.

Catalyzing Inclusive Financial Systems: Chile’s Commitment to Women’s Data. Data2X, 2014.

  • This article analyzes global and national data in the banking sector to fill the gap of sex-disaggregated data in Chile. The purpose of the study is to describe the difference in spending behavior and priorities between women and men, identify the challenges for women in accessing financial services, and create policies that promote women inclusion in Chile.

Ready to Measure: Twenty Indicators for Monitoring SDG Gender Targets. Open Data Watch and Data2X, 2016.

  • Using readily available data, this study identifies 20 SDG indicators related to gender issues that can serve as a baseline measurement for advancing gender equality, such as percentage of women aged 20-24 who were married or in a union before age 18 (child marriage), proportion of seats held by women in national parliament, and share of women among mobile telephone owners, among others.

Ready to Measure Phase II: Indicators Available to Monitor SDG Gender Targets. Open Data Watch and Data2X, 2017.

  • The Phase II paper is an extension of the Ready to Measure Phase I above. Where Phase I identifies the readily available data to measure women and girls well-being, Phase II provides information on how to access this data and summarizes insights extracted from it.
  • Phase II elaborates the insights about data gathered from ready to measure indicators and finds that although underlying data to measure indicators of women and girls’ wellbeing is readily available in most cases, it is typically not sex-disaggregated.
  • Over one in five – 53 out of 232 – SDG indicators specifically refer to women and girls. However, further analysis from this study reveals that at least 34 more indicators should be disaggregated by sex. For instance, there should be 15 more sex-disaggregated indicators for SDG number 3: “Ensure healthy lives and promote well-being for all at all ages.”
  • The report recommends national statistical agencies to take the lead and assert additional effort to fill the data gap by utilizing tools such as the statistical model to fill the current gender data gap for each of the SDGs.

Reed, Philip J., Muhammad Raza Khan, and Joshua Blumenstock. Observing gender dynamics and disparities with mobile phone metadata. International Conference on Information and Communication Technologies and Development (ICTD), 2016.

  • The study analyzes mobile phone logs of millions of Pakistani residents to explore whether there is a difference in mobile phone usage behavior between male and female and determine the extent to which gender inequality is reflected in mobile phone usage.
  • It utilizes mobile phone data to analyze the pattern of usage behavior between genders, and socioeconomic and demographic data obtained from census and advocacy groups to assess the state of gender equality in each region in Pakistan.
  • One of its findings is a strong positive correlation between the proportion of female mobile phone users and education score.

Stehlé, Juliette, et al. Gender homophily from spatial behavior in a primary school: A sociometric study. 2013.

  • This paper seeks to understand homophily, a human behavior that characterizes interactions with peers who have similarities in “physical attributes to tastes or political opinions”. Further, it seeks to identify the magnitude of influence, a type of homophily applied to social structures.
  • Focusing on gender interaction among primary school aged children in France, this paper collects data from wearable devices from 200 children in the period of 2 days and measures the physical proximity and duration of the interaction among those children in the playground.
  • It finds that interaction patterns are significantly determined by grade and class structure of the school. This means that children belonging to the same class have most interactions, and that lower grades usually do not interact with higher grades.
  • From a gender lens, this study finds that mixed-gender interaction lasts shorter relative to same-gender interaction. In addition, interaction among girls is also longer compared to interaction among boys. These indicate that the children in this school tend to have stronger relationships within their own gender, or what the study calls gender homophily. It further finds that gender homophily is apparent in all classes.

Strengthening Gender Measures and Data in the COVID-19 Era: An Urgent Need for Change. Paris 21, 2021.

  • COVID-19 has exacerbated gender disparities, especially with regard to women’s livelihoods, unpaid labor, mental health, and risk of gender-based violence. Gaps in gender data impedes robust, data-driven, and effective policies to quantify, analyse, and respond to these issues. 
  • Without this information, the full effects of the COVID-19 pandemic cannot be understood. This report calls on National Statistical Systems, survey managers, funders, multilateral agencies, researchers, and policymakers to collect gender-intentional and disaggregated data that is standardized and comparable to address key areas of concern for women and girls. Additionally, it seeks to link non-traditional data sources, such as social media and news media, with existing frameworks to fill in knowledge gaps. Moreover, this information must be rendered accessible for all stakeholders to maximize the potential of the information. Post-pandemic, conscious collection and collation of gendered data is vital to preempt policy problems.

The Sex, Gender and COVID-19 Project: The COVID-19 Sex-Disaggregated Data Tracker. 2021.

  • This data tracker, produced by Global Health 50/50, the African Population and Health Research Center, and the International Center for Research on Women, tracks which countries and datasets have reported sex-disaggregated data on COVID-19 testing, confirmed cases, hospitalizations, and deaths.

Data and Mobility

Bengtsson, Linus, et al. Using Mobile Phone Data to Predict the Spatial Spread of Cholera. Flowminder, 2015.

  • This study seeks to predict the 2010 cholera epidemic in Haiti using 2.9 million anonymous mobile phone SIM cards and reported cases of Cholera from the Haitian Directorate of Health, where 78 study areas were analyzed in the period of October 16 – December 16, 2010.
  • From this dataset, the study creates a mobility matrix that indicates mobile phone movement from one study area to another and combines that with the number of reported cases of cholera in the study areas to calculate the infectious pressure level of those areas.
  • The main finding of its analysis shows that the outbreak risk of a study area correlates positively with the infectious pressure level, where an infectious pressure of over 22 results in an outbreak within 7 days. Further, it finds that the infectious pressure level can inform the sensitivity and specificity of the outbreak prediction.
  • It hopes to improve infectious disease containment by identifying areas with highest risks of outbreaks.

Calabrese, Francesco, et al. Understanding Individual Mobility Patterns from Urban Sensing Data: A Mobile Phone Trace Example. SENSEable City Lab, MIT, 2012.

  • This study compares mobile phone data and odometer readings from annual safety inspections to characterize individual mobility and vehicular mobility in the Boston Metropolitan Area, measured by the average daily total trip length of mobile phone users and average daily Vehicular Kilometers Traveled (VKT).
  • The study found that, “accessibility to work and non-work destinations are the two most important factors in explaining the regional variations in individual and vehicular mobility, while the impacts of populations density and land use mix on both mobility measures are insignificant.” Further, “a well-connected street network is negatively associated with daily vehicular total trip length.”
  • This study demonstrates the potential for mobile phone data to provide useful and updatable information on individual mobility patterns to inform transportation and mobility research.

Campos-Cordobés, Sergio, et al. Chapter 5 – Big Data in Road Transport and Mobility Research.” Intelligent Vehicles. Edited by Felipe Jiménez. Butterworth-Heinemann, 2018.

  • This study outlines a number of techniques and data sources – such as geolocation information, mobile phone data, and social network observation – that could be leveraged to predict human mobility.
  • The authors also provide a number of examples of real-world applications of big data to address transportation and mobility problems, such as transport demand modeling, short-term traffic prediction, and route planning.

Gauvin, Laetitia et al. Gender gaps in urban mobility. Humanities and Information Science. Humanities & Social Sciences Communications vol. 7, issue 11, 2020.

  • This article discusses how urbanization affects mobility of women in realizing their rights. It points out the historic lack of gender disaggregated data for urban planning, leading to transportation designs that do not best accommodate the needs of women.
  • Examining the case study of urban mobility through a gendered lens in the large and growing metropolitan area of Santiago, Chile, the article examines the mobility traces from Call Detail Records (CDRs) of an anonymized cohort of mobile phone users, sorted by gender, over 3 months. It then mapped differences between men and women with regard to socio-demographic indicators and mobility differences across the city and through the Santiago transportation network structure and identified points of interests frequented by either sex to inform gendered mobility needs in urban areas.

Lin, Miao, and Wen-Jing Hsu. Mining GPS Data for Mobility Patterns: A Survey. Pervasive and Mobile Computing vol. 12, 2014.

  • This study surveys the current field of research using high resolution positioning data (GPS) to capture mobility patterns.
  • The survey focuses on analyses related to frequently visited locations, modes of transportation, trajectory patterns, and placed-based activities. The authors find “high regularity” in human mobility patterns despite high levels of variation among the mobility areas covered by individuals.

Phithakkitnukoon, Santi, Zbigniew Smoreda, and Patrick Olivier. Socio-Geography of Human Mobility: A Study Using Longitudinal Mobile Phone Data. PLoS ONE, 2012.

  • This study used a year’s call logs and location data of approximately one million mobile phone users in Portugal to analyze the association between individuals’ mobility and their social networks.
  • It measures and analyze travel scope (locations visited) and geo-social radius (distance from friends, family, and acquaintances) to determine the association.
  • It finds that 80% of places visited are within 20 km of an individual’s nearest social ties’ location and it rises to 90% at 45 km radius. Further, as population density increases, distance between individuals and their social networks decreases.
  • The findings in this study demonstrates how mobile phone data can provide insights to “the socio-geography of human mobility”.

Semanjski, Ivana, and Sidharta Gautama. Crowdsourcing Mobility Insights – Reflection of Attitude Based Segments on High Resolution Mobility Behaviour Data. vol. 71, Transportation Research, 2016.

  • Using cellphone data, this study maps attitudinal segments that explain how age, gender, occupation, household size, income, and car ownership influence an individual’s mobility patterns. This type of segment analysis is seen as particularly useful for targeted messaging.
  • The authors argue that these time- and space-specific insights could also provide value for government officials and policymakers, by, for example, allowing for evidence-based transportation pricing options and public sector advertising campaign placement.

Silveira, Lucas M., et al. MobHet: Predicting Human Mobility using Heterogeneous Data Sources. vol. 95, Computer Communications , 2016.

  • This study explores the potential of using data from multiple sources (e.g., Twitter and Foursquare), in addition to GPS data, to provide a more accurate prediction of human mobility. This heterogenous data captures popularity of different locations, frequency of visits to those locations, and the relationships among people who are moving around the target area. The authors’ initial experimentation finds that the combination of these sources of data are demonstrated to be more accurate in identifying human mobility patterns.

Wilson, Robin, et al. Rapid and Near Real-Time Assessments of Population Displacement Using Mobile Phone Data Following Disasters: The 2015 Nepal Earthquake. PLOS Current Disasters, 2016.

  • Utilizing call detail records of 12 million mobile phone users in Nepal, this study seeks spatio-temporal details of the population after the earthquake on April 25, 2015.
  • It seeks to answer the problem of slow and ineffective disaster response, by capturing near real-time displacement patterns provided by mobile phone call detail records, in order to inform humanitarian agencies on where to distribute their assistance. The preliminary results of this study were available nine days after the earthquake.
  • This project relies on the foundational cooperation with mobile phone operators, who supplied the de-identified data from 12 million users before the earthquake.
  • The study finds that shortly after the earthquake there was an anomalous population movement out of the Kathmandu Valley, the most impacted area, to surrounding areas. The study estimates 390,000 more people  than normal had left the valley.

Data, Gender and Mobility

Althoff, Tim, et al.Large-Scale Physical Activity Data Reveal Worldwide Activity Inequality. Nature, 2017.

  • This study’s analysis of worldwide physical activity is built on a dataset containing 68 million days of physical activity of 717,527 people collected through their smartphone accelerometers.
  • The authors find a significant reduction in female activity levels in cities with high active inequality, where high active inequality is associated with low city walkability – walkability indicators include pedestrian facilities (city block length, intersection density, etc.) and amenities (shops, parks, etc.).
  • Further, they find that high active inequality is associated with high levels of inactivity-related health problems, like obesity.

Borker, Girija. Safety First: Street Harassment and Women’s Educational Choices in India.Stop Street Harassment, 2017.

  • Using data collected from SafetiPin, an application that allows users to mark an area on a map as safe or not, and Safecity, another application that lets users share their experience of harassment in public places, Borker analyzes the safety of travel routes surrounding different colleges in India and their effect on women’s college choices.
  • The study finds that women are willing to go to a lower ranked college in order to avoid higher risk of street harassment. Women who choose the best college from their set of options, spend an average of $250 more each year to access safer modes of transportation.

Frias-Martinez, Vanessa, Enrique Frias-Martinez, and Nuria Oliver. A Gender-Centric Analysis of Calling Behavior in a Developing Economy Using Call Detail Records. Association for the Advancement of Artificial Intelligence, 2010.

  • Using encrypted Call Detail Records (CDRs) of 10,000 participants in a developing economy, this study analyzes the behavioral, social, and mobility variables to determine the gender of a mobile phone user, and finds that there is a difference in behavioral and social variables in mobile phone use between female and male.
  • It finds that women have higher usage of phone in terms of number of calls made, call duration, and call expenses compared to men. Women also have bigger social network, meaning that the number of unique phone numbers that contact or get contacted is larger. It finds no statistically significant difference in terms of distance made between calls in men and women.
  • Frias-Martinez et al recommends to take these findings into consideration when designing a cellphone based service.

Psylla, Ioanna, Piotr Sapiezynski, Enys Mones, Sune Lehmann. The role of gender in social network organization. PLoS ONE 12, December 20, 2017.

  • Using a large dataset of high resolution data collected through mobile phones, as well as detailed questionnaires, this report studies gender differences in a large cohort. The researchers consider mobility behavior and individual personality traits among a group of more than 800 university students.
  • Analyzing mobility data, they find both that women visit more unique locations over time, and that they have more homogeneous time distribution over their visited locations than men, indicating the time commitment of women is more widely spread across places.

The Landscape of Big Data and Gender. Data2X, February, 2021.

  • Under the backdrop of COVID-19, this report reaffirms that big data initiatives to study mobility, health, and social norms through gendered lenses have greatly progressed. More private companies and think tanks have launched data collection and sharing efforts to spur innovative projects to address COVID-19 complications.
  • However, economic opportunity, security, and civic action have been lagging behind. Big data collection among these topics is complicated by the lack of sex-disaggregated datasets, gender disparities in technology access, and the lack of gender-tags among big data.
  • Large technology firms, especially social networks like Facebook, LinkedIn, Uber, and more, create a large amount of gender-organized data. The report found that users and data-holding companies are willing to share this information for public policy reasons so long as it provides value and is protected. To this end, Data2X, alongside its partners, champion the use of data collaboratives to use gender sorted information for social good.

Vaitla, Bapu. Big Data and the Well Being of Women and Girls: Applications on the Social Scientific Frontier. Data2X, Apr. 2017.

  • In this study, the researchers use geospatial data, credit card and cell phone information, and social media posts to identify problems–such as malnutrition, education, access to healthcare, mental health–facing women and girls in developing countries.
  • From the credit card and cell phone data in particular, the report finds that analyzing patterns of women’s spending and mobility can provide useful insight into Latin American women’s “economic lifestyles.”
  • Based on this analysis, Vaitla recommends that various untraditional big data be used to fill gaps in conventional data sources to address the common issues of invisibility of women and girls’ data in institutional databases.

Updated and Expanded Selected Readings on Indigenous Data Sovereignty


By Juliet McMurren, Uma Kalkar, Yuki Mitsuda, and Andrew Zahuranec

Updated on October 11, 2021

As part of an ongoing effort to build a knowledge base for the field of improving governance through data and technology, The GovLab publishes a series of Selected Readings, which provide an annotated and curated collection of recommended works on themes such as open data, data collaboration, and civic technology.

In this edition, to recognize and honor Indigenous Peoples’ Day, we have updated our previous curation of literature on Indigenous data sovereignty (IDS)—the principle that Indigenous peoples should be able to control the data collected by and about them, to determine how and by whom it is accessed, stored, and used. These pieces discuss data practices and methodologies that reflect Indigenous peoples’ lived experiences, cultures, and worldviews. 

To suggest additional readings on this or any other topic, please email info@thelivinglib.org. All our Selected Readings can be found here.

Readings are listed in alphabetical order. New additions are highlighted in green.

Selected Readings (in alphabetical order)

Principles

Kukutai, Tahu and John Taylor (eds) Indigenous Data Sovereignty: Towards an Agenda (2016)

  • The foundational work in the field, this edited volume brings together Māori, Australian Aboriginal, Native American, and First Nations academics, researchers and data practitioners to set out the case for Indigenous data sovereignty.
  • Organized in four parts, the book begins by providing a historical account of colonialist statistics and the origins of the concept of Indigenous data sovereignty. In the second part, the authors set out an Indigenous critique of official statistics as a colonialist practice primarily intended to serve settler governments through the control of Indigenous peoples. As a result, population statistics from these societies are imbued with colonialist norms that both ignore indicators significant to Indigenous peoples and reduce them to what contributor Maggie Walter calls 5D data: disparity, deprivation, disadvantage, dysfunction, and difference.
  • The authors outline how Indigenous data sovereignty would work, setting out an agenda in which Indigenous people would control who should be counted among them, and establish collection priorities reflective of their cultural norms, interests, values and priorities. This could include a move away from data about individuals as single indicators used to compare, rank and drive “improvement” to a more nuanced and complex view of data that focuses on social groupings beyond the household. They would also control who would have access to the data gathered, with culturally appropriate rules and protocols for consents to access and use data. These principles are encapsulated in the First Nations OCAP® data model, through which they assert their ownership, control over collection, use and disclosure; access to, and possession of all First Nations’ data.
  • The third part of the book provides examples of Indigenous data sovereignty in practice, from the perspective of both data practitioners and data users. A case study of data sovereignty among the Yawuru of Western Australia outlines a methodology for developing data collection rooted in self-determination and community values of mabu buru (knowledge of the land) and mabu liyan (relational or community wellbeing). Another examines the work of a Māori primary health care organization, National Hauora Coalition, which conducted a rapid response campaign to reduce high rates of acute rheumatic fever among Māori children in Auckland. Stewardship and analysis of their own community data enabled targeted interventions that reduced positive Group A strep rates among children by 75 percent and rates of rheumatic fever by 33 percent.
  • The final section of the book outlines the emerging efforts of the New Zealand and Australian Government to engage with Indigenous peoples’ desire for data sovereignty through their statistical practices.

Lovett, Raymond et al Good Data Practices for Indigenous Data Sovereignty and Governance (2019)

  • This multi-authored chapter is the first in a volume the editors describe as born of frustration with dystopian “bad data” practices and devoted to the exploration of how data could be used “productively and justly to further social, economic, cultural and political goals.”
  • The chapter sets out the context for the emergence of IDS movements worldwide, and gives a survey of IDS networks and their foundational principles, such as OCAP® (above) and the Māori principles of rangatiratanga(right to own, access, control and possess), manaakitanga (ethical use to further wellbeing) and kaitiakitanga (sustainable stewardship) as they apply to data about or from themselves and their environs.
  • The article defines and differentiates IDS — the management of information in alignment with the laws, practices and customs of the nation-state in which it is located — and indigenous data governance (IDG), or power and authority over the design, ownership, access to and use of data. It situates IDS movements alongside broader movements for Indigenous sovereignty informed by the rights laid out in the UN Declaration on the Rights of Indigenous Peoples.

Rainie, Stephanie Carroll, Tahu Kukutai, Maggie Walter, Oscar Figueroa-Rodriguez, Jennifer Walker, and Per Axelsson (2019) Issues in Open Data — Indigenous Data Sovereignty. In T. Davies, S. Walker, M. Rubinstein, & F. Perini (Eds.), The State of Open Data: Histories and Horizons.

  • A chapter in State of Open Data: Histories and Horizons, a book that seeks to take stock of progress made toward open data across sectors, introduces the concept of Indigenous Data Sovereignty and describes how open data is a source of tension for Indigenous peoples. Reviewing the history and current usage of Indigenous data around the world, the chapter notes how Indigenous Data Sovereignty raises “fundamental questions about assumptions of ownership, representation, and control in open data communities” and how it challenges the open data movement’s approach to data ownership, licensing, and data use. It also notes how Indigenous nations are political entities and the ways that multi-layered governance challenges the “open data binary with one government actor, the nation-state.”
  • The authors observe that there is a widespread lack of understanding about IDS within the open data movement, and that open data policy and discussions have been largely framed to address the needs and interests of nation-states, with minimal engagement with Indigenous peoples. The authors provide a critique of the ways in which the Open Data Charter overlooks the issue of IDS in its principles on open by default, citizen engagement, and inclusive development. They note that the Open Data Charter’s commitment to free use, reuse, and redistribution by anyone, at any time, and anywhere, for example, is in direct conflict with the rights of Indigenous Peoples to govern their own data and control how and by whom it is accessed.
  • Opening state data that is unreliable, inaccurate, and designed, collected and processed according to the norms of state agencies poses additional problems for Indigenous peoples. Statistics about Indigenous peoples based on colonialist norms frequently perpetuate a narrative of inequality. Data infrastructures may be distorted by cultural assumptions, such as those about naming conventions, that misrepresent Indigenous people. In addition, the concept of open data has led to instances of cooptation and theft of Indigenous knowledge, when researchers have collected Indigenous knowledge about the environment, digitized it and shared it with Indigenous consent or oversight.
  • Drawing on the experience of Indigenous data networks worldwide, the authors propose three steps forward for the open data community in its relationship to Indigenous peoples. First, it needs to engage with Indigenous peoples as partners and knowledge holders to inform stewardship of Indigenous data. Secondly, IDS networks, with contacts in Indigenous communities and the world of data, should act as intermediaries for this engagement. Finally, they call for a broader adoption of principles on the governance and stewardship of Indigenous data within research and administration.

Research Data Alliance CARE Principles for Indigenous Data Governance (2019)

  • The RDA’s CARE principles propose an additional set of criteria that should be applied to open data in order to ensure that it respects Indigenous rights to self-determination. It argues the existing FAIR principles — that open data should be findable, accessible, interoperable and reusable — focus on data characteristics that facilitate increased sharing while ignoring historical context and power differentials.
  • To supplement FAIR, they propose the addition of CARE: that open data should be for collective benefit, recognize Indigenous peoples’ authority to control their own data, carry a responsibility to demonstrate how they benefit self-determination, and have embedded ethics prioritizing the rights and wellbeing of Indigenous people.
  • The principle of collective benefit asserts that data ecosystems should be designed in ways that Indigenous peoples can derive benefit from them. This includes active government support for Indigenous data use and reuse, using data to reduce information asymmetries between government and Indigenous communities, and the use of any value created from Indigenous data to benefit Indigenous communities. Authority to control recognizes the rights and interests of Indigenous peoples in their knowledge and data, and to govern and control how it is collected, accessed, used and stored.
  • Those working with Indigenous data have a responsibility to demonstrate how they are using it to benefit Indigenous communities. This involves fostering relationships of partnership and trust, working to build capability and capacity within Indigenous communities, and grounding data in the experiences, languages and worldviews of those communities. Finally, the rights and wellbeing of Indigenous peoples must be the primary concern. This requires data design and collection practices that do not stigmatize Indigenous people and that align with Indigenous ethical practices, that address imbalances in power and resources, and that are mindful of the potential for future use and potential harms.

Walter, Maggie, Raymond Lovett, Bobby Maher, Bhiamie Williamson, Jacob Prehn, Gawaian Bodkin‐Andrews, and Vanessa Lee. “Indigenous data sovereignty in the era of big data and open data.” Australian Journal of Social Issues 56, no. 2 (2021): 143-156.

  • A new book edited by Maggie Walter, Tahu Kukutai, Stephanie Russo Carroll, and Desi Rodriguez-Lonebear “examines how Indigenous Peoples around the world are demanding greater data sovereignty, and challenging the ways in which governments have historically used Indigenous data to develop policies and programs.” Through 15 articles, the book explores challenges and opportunities facing Indigenous peoples in places such as Aotearoa New Zealand, the Basque Country, and North and South America. These pieces explore various policy issues and methodological approaches from the perspective of Indigenous peoples to support positive social change.

Applications and Case Studies

Carroll, Stephanie Russo, Desi Rodriguez-Lonebear and Andrew Martinez, Indigenous Data Governance: Strategies from United States Native Nations (2019)

  • This article reviews IDS strategies from Native nations in the United States, connecting IDS and IDG to the rebuilding of Native nations and providing case studies of IDG occurring within tribal and non-tribal entities.
  • The article leads with a definition of key terms, including data dependency, “a paradox of scarcity and abundance: extensive data are collected about Indigenous peoples and nations, but rarely by or for Indigenous peoples’ and nations’ purposes.” It proposes IDG as a method by which the aspiration of IDS can be achieved, through a self-reinforcing cycle: governance of data leads to data rebuilding, providing data for governance that in turn leads to nation rebuilding.
  • The article offers three tribal, two non-tribal, and three urban, inter- and supra-tribal case studies of IDG in practice. The National Congress of American Indians Tribal Data Capacity Project, for example, was a pilot project to build tribal data capacity with five US tribes. Its outputs included a successful census conducted by the Pueblo of Laguna and the University of New Mexico, on tribal terms with tribal money for tribal purposes, and resulting in the development of proprietary software that remains the property of the tribe and that can be reused for subsequent collections.
  • The article concludes with a set of recommendations for tribal rights holders and stakeholders. It recommends that tribal rights holders develop tribe-specific data governance principles, policies, and procedures, and generate resources for IDG. Stakeholders are called on to acknowledge, support and promote IDS and embed it in data collection practices by building frameworks specifying how IDS is to be enacted in data processes, investing in intertribal institutions and recruiting and training Indigenous data professionals, among other measures.

Chaney, Christopher Data Sovereignty and the Tribal Law and Order Act (2018)

  • This article surveys the relationship between data sovereignty and the provision of criminal justice services, a key aspect of tribal sovereignty. The Tribal Law and Order Act (TLOA) 2010 addressed tribal data by mandating federal justice and law enforcement agencies to coordinate and consult with tribes over data collection, and providing tribal criminal justice agencies meeting federal and state requirements with access to national crime databases to enter and retrieve data.
  • TLOA has resulted in broad and extensive opportunities for federally recognized tribes to submit and retrieve data. Subject to federal law, tribes have the right to determine what information they will submit and access, putting the tribe in control of its own data. It also greatly facilitates the administration of tribal law enforcement and justice by enabling access to federal databases on property, such as vehicles and firearms, and people, including those on fugitives, sex offenders, and missing persons. The author suggests that TLOA implementation could serve as a model for other federal agencies working towards tribal data sovereignty arrangements.

First Nations Information Governance Centre (FNIGC) First Nations’ Data Sovereignty in Canada (2019).

  • This paper provides an overview of First Nations experiences of Canadian efforts to identify First Nations individuals, communities, and Nations in official statistics and data, and of the development of First Nations Data Sovereignty efforts over the previous two decades.
  • The paper surveys the ways in which early legislation constructed “Indians” and indian status within Canada counter to First Nations norms, harming traditional gender roles, leadership structures and governance, severing many First Nations women who “married out” from the culture and lands, and forcing First Nations people to choose between “enfranchisement” through education or employment and their Indian status and culture.
  • The paper then surveys the current First Nations statistical context, noting its numerous deficiencies. Sources of information and data, including the national census, were created with little or no Indigenous involvement or input, creating inconsistencies in the accuracy, reliability, usefulness, and comparability of the data. Even where the data is useful, it is not routinely used in planning and advocacy for the benefit of First Nations communities. First Nations are also required to meet onerous reporting requirements in order to access federal funding, but the resulting data — and other data from and about First Nations — are not effectively analyzed, used, or shared with First Nations.
  • The paper provides examples of effective instances of national and regional First Nations data sovereignty using OCAP® principles. These include the First Nations Information Governance Centre’s own survey work on health, childhood, education, labor and employment, but also similar provincial initiatives. The FNIGC is currently at work with regional partners to develop a National Data Governance Strategy to advance First Nations Data Sovereignty.

Garrison, Nanibaa’ et al Genomic Research Through An Indigenous Lens: Understanding the Expectations (2019)

  • This multi-authored study compares research guidelines for genomic research among Indigenous peoples in Canada, New Zealand, Australia, and the United States.
  • It notes that while there is a dearth of genomic research about Indigenous peoples, Indigenous communities have been the subject of western science in ways that have been intrusive, disrespectful and unethical, leading to community harms and mistrust. Lack of community engagement and informed consent for secondary use of data, and past experiences of harmful and negative representation in publications, have reduced the willingness of Indigenous peoples to engage with genetic research.
  • Canada, New Zealand, Australia, and the United States each have guidelines on scientific research among Indigenous peoples. The authors compare the provisions of these guidelines and the Indigenous Research Protection Act, a draft instrument developed by the Indigenous Peoples Council on Biocolonialism with the goal of protecting Indigenous peoples in research, across four principles: community engagement, rights and interests, institutional responsibilities, and ethical/legal oversight. They observe that while many of the policies provide for protection of Indigenous peoples relating to sample collection, secondary uses of data, benefits, and withdrawal from research, there is less consistency regarding cultural rights and interests, particularly in US instruments.
  • The authors examine ways Indigenous peoples have sought to “bridge the gap” between the benefits of genomic research and the protection of Indigenous peoples. Community protocol development, Indigenous-led genomics initiatives, and consent procedures that draw on UNDRIP have increased community engagement in some countries and fostered greater trust. Concrete progress has also been made in initiatives to preserve Indigenous rights and interests over biospecimens, including protocols that allow for the return of samples, biobanking, and Indigenous governance of resulting data.

Gifford, Heather and Kirikowhai Mikaere Te Kete Tū Ātea: Towards claiming Rangitīkei iwi data sovereignty (2019)

  • This article gives an outline of the Te Kete Tū Ātea research project, an four-year, two phase participatory research initiative by the Rangitīkei Iwi Collective to establish iwi data sovereignty. The first phase resulted in the development of an iwi data needs analysis and comprehensive iwi information framework, which identified potential data sources, gaps in current information, and strategies to address those gaps. The second phase led to the prioritization of a key information gathering domain, economic data, and a statistical evaluation of current iwi data holdings. The project adopted a Kaupapa Māori approach: it was “Māori led, Māori controlled, privileged a Māori worldview, and was framed around questions identified by Māori as of relevance to Māori.”
  • The first phase of the study identified a five domain framework to guide iwi data gathering. Collectively, these five domains — cultural, social, peoples, environmental and economic — make up Te Kete Tū Ātea, informed by three goal dimensions: kaitiakitanga, strengthening identity and connection, and empowerment and enablement.
  • The study identified challenges in assessing the wellbeing of iwi, including statistical capacity within iwi and the availability of data, but the authors suggest that the approach itself could be borrowed and applied by other iwi nationwide.

Goodluck, Kalen. “Why the U.S. is terrible at collecting Indigenous data.” High Country News, December 14, 2021. 

  • In this article, Kalen Goodluck interviews Abigail Echo-Hawk, the Chief Research Officer for the Seattle Indian Health Board and Director of the Urban Indigenous Health Institute, about the lack of Indigenous representation in health data collection and analysis. Missing and nonstandardized racial and ethnic information has “effectively “invisibilized”” Indigenous people, creating systemic issues for indigenous health and compounding the vulnerability of these groups during the COVID-19 pandemic. Moreover, despite indigenous legal rights to access federal data about indigenous health, communities and researchers face obstacles in accessing this information. Ultimately, Echo-Hawk calls for accountability mechanisms to leverage the new administration and open data for community-centered work.

Hasan, Najmul, Yukun Bao, and Shah J. Miah. “Exploring the impact of ICT usage among indigenous people and their quality of life: operationalizing Sen’s capability approach.” Information Technology for Development (2021): 1-21.

  • Najmul Hasan, Yukun Bao, and Shah J. Miah’s article in Information Technology for Development studies how the digital divide, the gap between those who benefit from access to digital technology and those who do not, affects indigenous people in Bangladesh. Using a structured questionnaire administered to 250 individuals, the researchers try to determine how Indigenous peoples’ freedom influence use of information communication technologies (ICTs) and whether there are inter-relationships among these factors and the use of information communication technology. The researchers find ICTs have a significant impact on Indigenous peoples in Bangladesh. Further they note among other findings, that the results “suggest development paths for indigenous society, providing political freedom to individuals as a combination of specific factors considered in our study. These include creating awareness about public decision making, focusing empowerment, local and national voting systems.”

Johnson-Jennings, Michelle, Derek Jennings, and Meg Little Indigenous data sovereignty in action: The Food Wisdom Repository (2019)

  • This article arose from the experience of the authors at the Research for Indigenous Community Health (RICH) Center. Observing that while Indigenous health and nutrition information is available, it is dispersed and difficult to access, they proposed the development of a Food Wisdom Repository to gather meaningful data and information on Indigenous health practices and efforts. The result, supported by the Shakopee Mdewakanton Sioux Community, is an online digital repository of wise food practices grounded in Indigenous knowledge and IDS.
  • The project draws on Indigenous worldviews, knowledge and ways of knowing, beliefs, and forms of power. In particular, it is framed around the idea of wise practices — pragmatic, flexible and sustainable practices rooted in a given local context and the wisdom of community members — rather than the objective, hierarchical, hegemonic and acontextual “best practices” of Western science.

Montenegro, Maria Subverting the universality of metadata standards: The TK labels as a tool to promote Indigenous data sovereignty (2019)

  • This paper explores how metadata standards, and in particular the widely used Dublin Core, reinforce colonial legal property frameworks and disenfranchise Indigenous people, and how they could be used (or subverted) to exercise and promote IDS.
  • The author notes that the rights and creator fields of DC are in direct conflict with Indigenous epistemologies and protocols on the access, circulation, and use of traditional Indigenous knowledge (TK). The rights field is embedded in western legal practice designed to recognize and protect new creations or inventions, and require a designated individual author and original work in order to offer any protection. This emphasis on originality and individuality is at odds with Indigenous knowledge that emphasizes collective and cumulative knowledge acquired over generations. Similarly, both western IP law and the creator field within DC recognize the individual who records the lifestyles, languages and cultural practices of Indigenous people in film, audio, or image as the legal author, rather than the communities from which the content arose. As subjects but not authors, Indigenous people have no control over these recordings of their cultural practices, or how they are stored, accessed or reused. Indeed, they are even legally required to seek permission from the author to reuse these materials that document their lives and culture.
  • Developed in collaboration with Indigenous peoples, the TK labels are a set of digital tags that can be included as associated metadata in various digital information contexts such as CMSs, online catalogs and databases, finding aids and online platforms. These tags are intended to increase awareness of culturally appropriate circulation, access and use of Indigenous cultural materials. Designed to be used where communities are unable to assert legal control over materials, they provide important information about culturally appropriate use and stewardship. A Seasonal tag developed by the Penobscot Nation, for example, proscribes access to some content outside a given time of year, while an Attribution label, the most widely used, allows Indigenous communities to assert that they are the TK holders of the content and should be acknowledged as such.
  • While the TK labels represent a welcome advance in capturing and asserting Indigenous metadata standards, they are voluntary, and therefore only function if non-tribal collecting institutions recognize the IDS of the tribes.

McMahon, Rob, Tim LaHache, and Tim Whiteduck Digital Data Management as Indigenous Resurgence in Kahnawà:ke (2015)

  • This article documents IDG experiences within the Kahnawà:ke Mohawk (Quebec) community as it set up and used ICT systems to manage community data on research, education, finance, health, membership, housing, lands, and resources. Their research followed the implementation of a customized digital data management system, and sought to find out employees of community service organizations, chiefly in education, conceived of and used data, and the role of data management as part of self-government and Indigenous resurgence.
  • The authors describe the initiative as an act of “everyday community resurgence,” but one that was accompanied by significant internal tensions and challenges. They note the need to avoid technological determinism in IDS, since the use of ICTs has the potential to exacerbate the effects of settler colonialism, concentrating and centralizing power.
  • The article describes the rollout, architecture, and governance of the Kahnawà:ke data management system. One of the challenges faced by the community was in data sharing, with a lack of trust between community organizations leading to data hugging and silos. This tension, which has been identified in other research cited by the authors, points to the need for trust-building in order to promote more holistic data sharing and optimal data use.

Oguamanam, Chidi. “Indigenous Peoples, Data Sovereignty, and Self-Determination: Current Realities and Imperatives.” The African Journal of Information and Communication (AJIC), 26, 1-20. https://doi.org/10.23962/10539/30360

  • This paper by University of Ottawa Professor and Centre for International Governance Innovation Senior Fellow Chidi Oguamanam describes the current state of the global Indigenous data sovereignty movement. Describing the conceptual and practical context, the response by the Government of Canada and most members of Canada’s First Nations, and a variety of other responses, Oguamanam describes how the movement relates to larger efforts to promote Indigenous self-determination, finding “fundamental tension between the objectives of Indigenous data sovereignty and those of the open data movement, which does not directly cater for Indigenous peoples’ full control over their data.”

Rainie, Stephanie Carroll et al Data as a Strategic Resource: Self-determination, Governance, and the Data Challenge for Indigenous Nations in the United States (2017)

  • Despite the need of Indigenous nations for data to help identify problems and find solutions, US Indigenous nations encounter a data landscape characterized by “sparse, inconsistent, and irrelevant information complicated by limited access and utility” that does not serve to address tribally defined needs. Because much of this data is collected and controlled by others for their own purposes, mistrust in data collection is high.
  • This article documents two cases studies in tribal data sovereignty and data governance, among the Ysleta del Sur Pueblo and Cheyenne River Sioux Tribe. It lays out the data priorities, agendas and challenges faced by each, and the resulting data initiatives, protocols and uses. The article also discusses how this data governance contributed to the tribes’ self determination.
  • As part of a development strategy, in 2008 the Ysleta del Sur began to collect socioeconomic and demographic data annually from its citizens as part of its enrolment process. Implementing a census approach that incorporated cultural and local knowledge and western epistemologies, the project yielded data about population, poverty rates, household incomes, educational attainment, workforce and unemployment that was more complete than US census data. Strong community engagement yielded a 90 percent response rate, and the results inspired other data initiatives to support community strategic decision making. The socioeconomic data was also used to support successful applications for federal funding.
  • Identifying high levels of poverty and unemployment as a problem, the Cheyenne River Sioux Tribe sought a comprehensive plan to address these problems, for which they needed timelier, more granular, and more culturally and locally relevant data than that available through the federal government. With academic partners, it developed a survey and data collection process to collect baseline demographic and socioeconomic data from a sample of residents. The survey was able to quantify unemployment rates among people living on the reservation, but also captured employment categories missed by federal data collection, such as the arts microenterprise sector. The results were shared back to the community, and used to foster microenterprises and write grant applications.

Walter, Maggie and Michelle Suina Indigenous data, indigenous methodologies and indigenous data sovereignty (2019)

  • In this article, Walter and Suina propose that there is a dearth of Indigenous quantitative methodologies, driven by a longstanding mistrust of positivist research that positions Indigenous peoples within a deficit discourse. What the authors call “quantitative avoidance” leads to lived consequences for Indigenous peoples: since the statistics produced by quantitative methods form the primary evidence base for policy within the colonial societies, failing to engage with them removes Indigenous people from a critical part of the policy debate.
  • The authors make a case for developing Indigenous quantitative methodologies. They evidence the Albuquerque Area Southwest Tribal Epidemiology Center, whose mission is to collaborate with the 27 tribes of their health area to provide high quality, culturally congruent epidemiology, capacity development, program evaluation, and health promotion. Committed to honoring tribal sovereignty, AASTEC is committed to building capacity that enables tribes to control data design, collection, and management at all stages of the process. This requires not merely adapting western survey instruments, but redesigning them to incorporate the values and definitions of health of the communities they serve.
  • The authors close with three recommendations for communities and stakeholders interested in building Indigenous quantitative methodologies. First, communities need to cultivate technical skills for survey development, data collection, analysis and reporting. Secondly, they need to build comfort and understand about research methods among tribal partners in order to undo decades of mistrust; the authors describe simulation exercises that help to demonstrate how worldviews shape expectations and perceptions around data that they have used successfully with Indigenous and non-Indigenous participants. Finally, they should pursue advocacy of IDS and an exchange of ideas that allows successful Indigenous research methodologies to be promulgated.

Selected Readings on Data Portability


By Juliet McMurren, Andrew Young, and Stefaan G. Verhulst

As part of an ongoing effort to build a knowledge base for the field of improving governance through technology, The GovLab publishes a series of Selected Readings, which provide an annotated and curated collection of recommended works on themes such as open data, data collaboration, and civic technology.

In this edition, we explore selected literature on data portability.

To suggest additional readings on this or any other topic, please email info@thelivinglib.org. All our Selected Readings can be found here.

Context

Data today exists largely in silos, generating problems and inefficiencies for the individual, business and society at large. These include:

  • difficulty switching (data) between competitive service providers;
  • delays in sharing data for important societal research initiatives;
  • barriers for data innovators to reuse data that could generate insights to inform individuals’ decision making; and
  • inhibitions to scale data donation.

Data portability — the principle that individuals have a right to obtain, copy, and reuse their own personal data and to transfer it from one IT platform or service to another for their own purposes — is positioned as a solution to these problems. When fully implemented, it would make data liquid, giving individuals the ability to access their own data in a usable and transferable format, transfer it from one service provider to another, or donate data for research and enhanced data analysis by those working in the public interest.

Some companies, including Google, Apple, Twitter and Facebook, have sought to advance data portability through initiatives like the Data Transfer Project, an open source software project designed to facilitate data transmittals. Newly enacted data protection legislation such as Europe’s General Data Protection Regulation (2018) and the California Consumer Privacy Act (2018) give data holders a right to data portability. However, despite the legal and technical advances made, many questions toward scaling up data liquidity and portability responsibly and systematically remain. These new data rights have generated complex and as yet unanswered questions about the limits of data ownership, the implications for privacy, security and intellectual property rights, and the practicalities of how, when, and to whom data can be transferred.

In this edition of the GovLab’s Selected Readings series, we examine the emerging literature on data portability to provide a foundation for future work on the value proposition of data portability. Readings are listed in alphabetical order.

Selected readings

Cho, Daegon, Pedro Ferreira, and Rahul Telang, The Impact of Mobile Number Portability on Price and Consumer Welfare (2016)

  • In this paper, the authors analyze how Mobile Number Portability (MNP) — the ability for consumers to maintain their phone number when changing providers, thus reducing switching costs — affected the relationship between switching costs, market price and consumer surplus after it was introduced in most European countries in the early 2000s.
  • Theory holds that when switching costs are high, market leaders will enjoy a substantial advantage and are able to keep prices high. Policy makers will therefore attempt to decrease switching costs to intensify competition and reduce prices to consumers.
  • The study reviewed quarterly data from 47 wireless service providers in 15 EU countries between 1999 and 2006. The data showed that MNP simultaneously decreased market price by over four percent and increased consumer welfare by an average of at least €2.15 per person per quarter. This increase amounted to a total of €880 million per quarter across the 15 EU countries analyzed in this paper and accounted for 15 percent of the increase in consumer surplus observed over this time.

CtrlShift, Data Mobility: The data portability growth opportunity for the UK economy (2018)

  • Commissioned by the UK Department of Digital, Culture, Media and Sport (DCMS), this study was intended to identify the potential of personal data portability for the UK economy.
  • Its scope went beyond the legal right to data portability envisaged by the GDPR, to encompass the current state of personal data portability and mobility, requirements for safe and secure data sharing, and the potential economic benefits through stimulation of innovation, productivity and competition.
  • The report concludes that increased personal data mobility has the potential to be a vital stimulus for the development of the digital economy, driving growth by empowering individuals to make use of their own data and consent to others using it to create new data-driven services and technologies.
  • However, the report concludes that there are significant challenges to be overcome, and new risks to be addressed, before the value of personal data can be realized. Much personal data remains locked in organizational silos, and systemic issues related to data security and governance and the uneven sharing of benefits need to be resolved.

Data Guidance and Future of Privacy Forum, Comparing Privacy Laws: GDPR v. CCPA (2018)

  • This paper compares the provisions of the GDPR with those of the California Consumer Privacy Act (2018).
  • Both article 20 of the GDPR and section 1798 of the CCPA recognize a right to data portability. Both also confer on data subjects the right to receive data from controllers free of charge upon request, and oblige controllers to create mechanisms to provide subjects with their data in portable and reusable form so that it can be transmitted to third parties for reuse.
  • In the CCPA, the right to data portability is an extension of the right to access, and only confers on data subjects the right to apply for data collected within the past 12 months and have it delivered to them. The GDPR does not impose a time limit, and allows data to be transferred from one data controller to another, but limits the right to automatically collected personal data provided by the data subject themselves through consent or contract.

Data Transfer Project, Data Transfer Project Overview and Fundamentals (2018)

  • The paper presents an overview of the goals, principles, architecture, and system components of the Data Transfer Project. The intent of the DTP is to increase the number of services offering data portability and provide users with the ability to transfer data directly in and out of participating providers through systems that are easy and intuitive to use, private and secure, reciprocal between services, and focused on user data. The project, which is supported by Microsoft, Google, Twitter and Facebook, is an open-source initiative that encourages the participation of other providers to reduce the infrastructure burden on providers and users.
  • In addition to benefits to innovation, competition, and user choice, the authors point to benefits to security, through allowing users to backup, organize, or archive their data, recover from account hijacking, and retrieve their data from deprecated services.
  • The DTP’s remit was to test concepts and feasibility for the transfer of specific types of user data between online services using a system of adapters to transfer proprietary formats into canonical formats that can be used to transfer data while allowing providers to maintain control over the security of their service. While not resolving all formatting or support issues, this approach would allow substantial data portability and encourage ecosystem sustainability.

Deloitte, How to Flourish in an Uncertain Future: Open Banking(2017)

  • This report addresses the innovative and disruptive potential of open banking, in which data is shared between members of the banking ecosystem at the authorization of the customer, with the potential to increase competition and facilitate new products and services. In the resulting marketplace model, customers could use a single banking interface to access products from multiple players, from established banks to newcomers and fintechs.
  • The report’s authors identify significant threats to current banking models. Banks that failed to embrace open banking could be relegated to a secondary role as an infrastructure provider, while third parties — tech companies, fintech, and price comparison websites — take over the customer relationship.
  • The report identifies four overlapping operating models banks could adopt within an open banking model: full service providers, delivering proprietary products through their own interface with little or no third-party integration; utilities, which provide other players with infrastructure without customer-facing services; suppliers, which offer proprietary products through third-party interfaces; and interfaces,which provide distribution services through a marketplace interface. To retain market share, incumbents are likely to need to adopt a combination of these roles, offering their own products and services and those of third parties through their own and others’ interfaces.

Digital Competition Expert Panel Unlocking Digital Competition(2019)

  • This report captures the findings of the UK Digital Competition Expert Panel, which was tasked in 2018 with considering opportunities and challenges the digital economy might pose for competition and competition policy and to recommend any necessary changes. The panel focused on the impact of big players within the sector, appropriate responses to mergers or anticompetitive practices, and the impact on consumers.
  • The panel found that the digital economy is creating many benefits, but that digital markets are subject to tipping, in which emerging winners can scoop much of the market. This concentration can give rise to substantial costs, especially to consumers, and cannot be solved by competition alone. However, government policy and regulatory solutions have limitations, including the slowness of policy change, uneven enforcement and profound informational asymmetries between companies and government.
  • The panel proposed the creation of a digital markets unit that would be tasked with developing a code of competitive conduct, enabling greater personal data mobility and systems designed with open standards, and advancing access to non-personal data to reduce barriers to market entry.
  • The panel’s model of data mobility goes beyond data portability, which involves consumers being able to request and transfer their own data from one provider to another. Instead, the panel recommended empowering consumers to instigate transfers of data between a business and a third party in order to access price information, compare goods and services, or access tailored advice and recommendations. They point to open banking as an example of how this could function in practice.
  • It also proposed updating merger policy to make it more forward-looking to better protect consumers and innovation and preserve the competitiveness of the market. It recommended the creation of antitrust policy that would enable the implementation of interim measures to limit damage to competition while antitrust cases are in process.

Egan, Erin, Charting a Way Forward: Data Portability and Privacy(2019)

  • This white paper by Facebook’s VP and Chief Privacy Officer, Policy, represents an attempt to advance the conversation about the relationship between data portability, privacy, and data protection. The author sets out five key questions about data portability: what is it, whose and what data should be portable, how privacy should be protected in the context of portability, and where responsibility for data misuse or improper protection should lie.
  • The paper finds that definitions of data portability still remain imprecise, particularly with regard to the distinction between data portability and data transfer. In the interest of feasibility and a reasonable operational burden on providers, it proposes time limits on providers’ obligations to make observed data portable.
  • The paper concludes that there are strong arguments both for and against allowing users to port their social graph — the map of connections between that user and other users of the service — but that the key determinant should be a capacity to ensure the privacy of all users involved. Best-practice data portability protocols that would resolve current differences of approach as to what, how and by whom information should be made available would help promote broader portability, as would resolution of liability for misuse or data exposure.

Engels, Barbara, Data portability among online platforms (2016)

  • The article examines the effects on competition and innovation of data portability among online platforms such as search engines, online marketplaces, and social media, and how relations between users, data, and platform services change in an environment of data portability.
  • The paper finds that the benefits to competition and innovation of portability are greatest in two kinds of environments: first, where platforms offer complementary products and can realize synergistic benefits by sharing data; and secondly, where platforms offer substitute or rival products but the risk of anti-competitive behaviour is high, as for search engines.
  • It identifies privacy and security issues raised by data portability. Portability could, for example, allow an identity fraudster to misuse personal data across multiple platforms, compounding the harm they cause.
  • It also suggests that standards for data interoperability could act to reduce innovation in data technology, encouraging data controllers to continue to use outdated technologies in order to comply with inflexible, government-mandated standards.

Graef, Inge, Martin Husovec and Nadezhda Purtova, Data Portability and Data Control: Lessons for an Emerging Concept in EU Law (2018)

  • This paper situates the data portability right conferred by the GDPR within rights-based data protection law. The authors argue that the right to data portability should be seen as a new regulatory tool aimed at stimulating competition and innovation in data-driven markets.
  • The authors note the potential for conflicts between the right to data portability and the intellectual property rights of data holders, suggesting that the framers underestimated the potential impact of such conflicts on copyright, trade secrets and sui generis database law.
  • Given that the right to data portability is being replicated within consumer protection law and the regulation of non-personal data, the authors argue framers of these laws should consider the potential for conflict and the impact of such conflict on incentives to innovate.

Mohsen, Mona Omar and Hassan A. Aziz The Blue Button Project: Engaging Patients in Healthcare by a Click of a Button (2015)

  • This paper provides a literature review on the Blue Button initiative, an early data portability project which allows Americans to access, view or download their health records in a variety of formats.
  • Originally launched through the Department of Veterans’ Affairs in 2010, the Blue Button initiative had expanded to more than 500 organizations by 2014, when the Department of Health and Human Services launched the Blue Button Connector to facilitate both patient access and development of new tools.
  • The Blue Button has enabled the development of tools such as the Harvard-developed Growth-Tastic app, which allows parents to check their child’s growth by submitting their downloaded pediatric health data. Pharmacies across the US have also adopted the Blue Button to provide patients with access to their prescription history.

More than Data and Mission: Smart, Got Data? The Value of Energy Data to Customers (2016)

  • This report outlines the public value of the Green Button, a data protocol that provides customers with private and secure access to their energy use data collected by smart meters.
  • The authors outline how the use of the Green Button can help states meet their energy and climate goals by enabling them to structure renewables and other distributed energy resources (DER) such as energy efficiency, demand response, and solar photovoltaics. Access to granular, near real time data can encourage innovation among DER providers, facilitating the development of applications like “virtual energy audits” that identify efficiency opportunities, allowing customers to reduce costs through time-of-use pricing, and enabling the optimization of photovoltaic systems to meet peak demand.
  • Energy efficiency receives the greatest boost from initiatives like the Green Button, with studies showing energy savings of up to 18 percent when customers have access to their meter data. In addition to improving energy conservation, access to meter data could improve the efficiency of appliances by allowing devices to trigger sleep modes in response to data on usage or price. However, at the time of writing, problems with data portability and interoperability were preventing these benefits from being realized, at a cost of tens of millions of dollars.
  • The authors recommend that commissions require utilities to make usage data available to customers or authorized third parties in standardized formats as part of basic utility service, and tariff data to developers for use in smart appliances.

MyData, Understanding Data Operators (2020)

  • MyData is a global movement of data users, activists and developers with a common goal to empower individuals with their personal data to enable them and their communities to develop knowledge, make informed decisions and interact more consciously and efficiently.
  • This introductory paper presents the state of knowledge about data operators, trusted data intermediaries that provide infrastructure for human-centric personal data management and governance, including data sharing and transfer. The operator model allows data controllers to outsource issues of legal compliance with data portability requirements, while offering individual users a transparent and intuitive way to manage the data transfer process.
  • The paper examines use cases from 48 “proto-operators” from 15 countries who fulfil some of the functions of an operator, albeit at an early level of maturity. The paper finds that operators offer management of identity authentication, data transaction permissions, connections between services, value exchange, data model management, personal data transfer and storage, governance support, and logging and accountability. At the heart of these functions is the need for minimum standards of data interoperability.
  • The paper reviews governance frameworks from the general (legislative) to the specific (operators), and explores proto-operator business models. In keeping with an emerging field, business models are currently unclear and potentially unsustainable, and one of a number of areas, including interoperability requirements and governance frameworks, that must still be developed.

National Science and Technology Council Smart Disclosure and Consumer Decision Making: Report of the Task Force on Smart Disclosure (2013)

  • This report summarizes the work and findings of the 2011–2013 Task Force on Smart Disclosure: Information and Efficiency, an interagency body tasked with advancing smart disclosure, through which data is made more available and accessible to both consumers and innovators.
  • The Task Force recognized the capacity of smart disclosure to inform consumer choices, empower them through access to useful personal data, enable the creation of new tools, products and services, and promote efficiency and growth. It reviewed federal efforts to promote smart disclosure within sectors and in data types that crosscut sectors, such as location data, consumer feedback, enforcement and compliance data and unique identifiers. It also surveyed specific public-private partnerships on access to data, such as the Blue and Green Button and MyData initiatives in health, energy and education respectively.
  • The Task Force reviewed steps taken by the Federal Government to implement smart disclosure, including adoption of machine readable formats and standards for metadata, use of APIs, and making data available in an unstructured format rather than not releasing it at all. It also reviewed “choice engines” making use of the data to provide services to consumers across a range of sectors.
  • The Task Force recommended that smart disclosure should be a core component of efforts to institutionalize and operationalize open data practices, with agencies proactively identifying, tagging, and planning the release of candidate data. It also recommended that this be supported by a government-wide community of practice.

Nicholas, Gabriel Taking It With You: Platform Barriers to Entry and the Limits of Data Portability (2020)

  • This paper considers whether, as is often claimed, data portability offers a genuine solution to the lack of competition within the tech sector.
  • It concludes that current regulatory approaches to data portability, which focus on reducing switching costs through technical solutions such as one-off exports and API interoperability, are not sufficient to generate increased competition. This is because they fail to address other barriers to entry, including network effects, unique data access, and economies of scale.
  • The author proposes an alternative approach, which he terms collective portability, which would allow groups of users to coordinate the transfer of their data to a new platform. This model raises questions about how such collectives would make decisions regarding portability, but would enable new entrants to successfully target specific user groups and scale rapidly without having to reach users one by one.

OECD, Enhancing Access to and Sharing of Data: Reconciling Risks and Benefits for Data Re-use across Societies (2019)

  • This background paper to a 2017 expert workshop on risks and benefits of data reuse considers data portability as one strategy within a data openness continuum that also includes open data, market-based B2B contractual agreements, and restricted data-sharing agreements within research and data for social good applications.
  • It considers four rationales offered for data portability. These include empowering individuals towards the “informational self-determination” aspired to by GDPR, increased competition within digital and other markets through reductions in information asymmetries between individuals and providers, switching costs, and barriers to market entry; and facilitating increased data flows.
  • The report highlights the need for both syntactic and semantic interoperability standards to ensure data can be reused across systems, both of which may be fostered by increased rights to data portability. Data intermediaries have an important role to play in the development of these standards, through initiatives like the Data Transfer Project, a collaboration which brought together Facebook, Google, Microsoft, and Twitter to create an open-source data portability platform.

Personal Data Protection Commission Singapore Response to Feedback on the Public Consultation on Proposed Data Portability and Data Innovation Provisions (2020)

  • The report summarizes the findings of the 2019 PDPC public consultation on proposals to introduce provisions on data portability and data innovation in Singapore’s Personal Data Protection Act.
  • The proposed provision would oblige organizations to transmit an individual’s data to another organization in a commonly used machine-readable format, upon the individual’s request. The obligation does not extend to data intermediaries or organizations that do not have a presence in Singapore, although data holders may choose to honor those requests.
  • The obligation would apply to electronic data that is either provided by the individual or generated by the individual’s activities in using the organization’s service or product, but not derived data created by the processing of other data by the data holder. Respondents were concerned that including derived data could harm organizations’ competitiveness.
  • Respondents were concerned about how to honour data portability requests where the data of third parties was involved, as in the case of a joint account holder, for example. The PDPC opted for a “balanced, reasonable, and pragmatic approach,” allowing data involving third parties to be ported where it was under the requesting individual’s control, was to be used for domestic and personal purposes, and related only to the organization’s product or service.

Quinn, Paul Is the GDPR and Its Right to Data Portability a Major Enabler of Citizen Science? (2018)

  • This article explores the potential of data portability to advance citizen science by enabling participants to port their personal data from one research project to another. Citizen science — the collection and contribution of large amounts of data by private individuals for scientific research — has grown rapidly in response to the development of new digital means to capture, store, organize, analyze and share data.
  • The GDPR right to data portability aids citizen science by requiring transfer of data in machine-readable format and allowing data subjects to request its transfer to another data controller. This requirement of interoperability does not amount to compatibility, however, and data thus transferred would probably still require cleaning to be usable, acting as a disincentive to reuse.
  • The GDPR’s limitation of transferability to personal data provided by the data subject excludes some forms of data that might possess significant scientific potential, such as secondary personal data derived from further processing or analysis.
  • The GDPR right to data portability also potentially limits citizen science by restricting the grounds for processing data to which the right applies to data obtained through a subject’s express consent or through the performance of a contract. This limitation excludes other forms of data processing described in the GDPR, such as data processing for preventive or occupational medicine, scientific research, or archiving for reasons of public or scientific interest. It is also not clear whether the GDPR compels data controllers to transfer data outside the European Union.

Wong, Janis and Tristan Henderson, How Portable is Portable? Exercising the GDPR’s Right to Data Portability (2018)

  • This paper presents the results of 230 real-world requests for data portability in order to assess how — and how well — the GDPR right to data portability is being implemented. The authors were interested in establishing the kinds of file formats that were returned in response to requests, and to identify practical difficulties encountered in making and interpreting requests, over a three month period beginning on the day the GDPR came into effect.
  • The findings revealed continuing problems around ensuring portability for both data controllers and data subjects. Of the 230 requests, only 163 were successfully completed.
  • Data controllers frequently had difficulty understanding the requirements of GDPR, providing data in incomplete or inappropriate formats: only 40 percent of the files supplied were in a fully compliant format. Additionally, some data controllers were confused between the right to data portability and other rights conferred by the GDPR, such as the right to access or erasure.

The GovLab Selected Readings on Open Data for Developing Economies


By Andrew Young, Stefaan Verhulst, and Juliet McMurren

This edition of the GovLab Selected Readings was developed as part of the Open Data for Developing Economies research project (in collaboration with WebFoundation, USAID and fhi360). Special thanks to Maurice McNaughton, Francois van Schalkwyk, Fernando Perini, Michael Canares and David Opoku for their input on an early draft. Please contact Stefaan Verhulst (stefaan@thegovlab.org) for any additional input or suggestions.

Data-and-its-uses-for-Governance-1024x491

Open data is increasingly seen as a tool for economic and social development. Across sectors and regions, policymakers, NGOs, researchers and practitioners are exploring the potential of open data to improve government effectiveness, create new economic opportunity, empower citizens and solve public problems in developing economies. Open data for development does not exist in a vacuum – rather it is a phenomenon that is relevant to and studied from different vantage points including Data4Development (D4D), Open Government, the United Nations’ Sustainable Development Goals (SDGs), and Open Development. The below selected readings provide a view of the current research and practice on the use of open data for development and its relationship to related interventions.

Selected Reading List (in alphabetical order)

Annotated Selected Readings List (in alphabetical order)

Open Data and Open Government for Development

Benjamin, Solomon, R. Bhuvaneswari, P. Rajan, Manjunatha, “Bhoomi: ‘E-Governance’, or, An Anti-Politics Machine Necessary to Globalize Bangalore?” CASUM-m Working Paper, January 2007,http://bit.ly/2aD3vZe

  • This paper explores the digitization of land titles and their effect on governance in Bangalore. The paper takes a critical view of digitization and transparency efforts, particularly as best practices that should be replicated in many contexts.
  • The authors point to the potential of centralized open data and land records databases as a means for further entrenching existing power structures. They found that the digitization of land records in Bangalore “led to increased corruption, much more bribes and substantially increased time taken for land transactions,” as well allowing “very large players in the land markets to capture vast quantities of land when Bangalore experiences a boom in the land market.”
  • They argue for the need “to replace politically neutered concepts like ‘transparency’, ‘efficiency’, ‘governance’, and ‘best practice’ conceptually more rigorous terms that reflect the uneven terrain of power and control that governance embodies.

McGee, Rosie and Duncan Edwards, “Introduction: Opening Governance – Change, Continuity and Conceptual Ambiguity,” IDS Bulletin, January 24, 2016. http://bit.ly/2aJn1pq.  

  • This introduction to a special issue of the IDS Bulletin frames the research and practice of leveraging opening governance as part of a development agenda.
  • The piece primarily focuses on a number of “critical debates” that “have begun to lay bare how imprecise and overblown the expectations are in the transparency, accountability and openness ‘buzzfield’, and the problems this poses.”
  • A key finding on opening governance’s uptake and impact in the development space relates to political buy-in:
    • “Political will is generally a necessary but insu cient condition for governance processes and relationships to become more open, and is certainly a necessary but insu cient condition for tech-based approaches to open them up. In short, where there is a will, tech-for-T&A may be able to provide a way; where there isn’t a will, it won’t.”

Open Data and Data 4 Development

3rd International Open Data Conference (IODC), “Enabling the Data Revolution: An International Open Data Roadmap,” Conference Report, 2015, http://bit.ly/2asb2ei

  • This report, prepared by Open Data for Development, summarizes the proceedings of the third IODC in Ottawa, ON. It sets out an action plan for “harnessing open data for sustainable development”, with the following five priorities:
    1. Deliver shared principles for open data
    2. Develop and adopt good practices and open standards for data publication
    3. Build capacity to produce and use open data effectively
    4. Strengthen open data innovation networks
    5. Adopt common measurement and evaluation tools
  • The report draws on 70 impact accounts to present cross-sector evidence of “the promise and reality of open data,” and emphasizes the utility of open data in monitoring development goals, and the importance of “joined-up open data infrastructures,” ensuring wide accessibility, and grounding measurement in a clear understanding of citizen need, in order to realize the greatest benefits from open data.
  • Finally, the report sets out a draft International Open Data Charter and Action Plan for International Collaboration.

Hilbert, Martin, “Big Data for Development: A Review of Promises and Challenges,” Development Policy Review, December 13, 2015, http://bit.ly/2aoPtxL.

  • This article presents a conceptual framework based on the analysis of 180 articles on the opportunities and threats of big data for international development.
  • Open data, Hilbert argues, can be an incentive for those outside of government to leverage big data analytics: “If data from the public sector were to be openly available, around a quarter of existing data resources could be liberated for Big Data Analytics.”
  • Hilbert explores the misalignment between “the level of economic well-being and perceived transparency of a country” and the existence of an overarching open data policy. He points to low-income countries that are active in the open data effort, like Kenya, Russia and Brazil, in comparison to “other countries with traditionally high perceived transparency,” which are less active in releasing data, like Chile, Belgium and Sweden.

International Development Research Centre, World Wide Web Foundation, and Berkman Center at Harvard University, “Fostering a Critical Development Perspective on Open Government Data,” Workshop Report, 2012, http://bit.ly/2aJpyQq

  • This paper considers the need for a critical perspective on whether the expectations raised by open data programmes worldwide — as “a suitable remedy for challenges of good governance, economic growth, social inclusion, innovation, and participation” — have been met, and if so, under what circumstances.
  • Given the lack of empirical evidence underlying the implementation of Open Data initiative to guide practice and policy formulation, particularly in developing countries, the paper discusses the implementation of a policy-oriented research agenda to ensure open data initiatives in the Global South “challenge democratic deficits, create economic value and foster inclusion.”
  • The report considers theories of the relationship between open data and impact, and the mediating factors affecting whether that impact is achieved. It takes a broad view of impact, including both demand- and supply-side economic impacts, social and environmental impact, and political impact.

Open Data for Development, “Open Data for Development: Building an Inclusive Data Revolution,” Annual Report, 2015, http://bit.ly/2aGbkz5

  • This report — the inaugural annual report for the Open Data for Development program — gives an overview of outcomes from the program for each of OD4D’s five program objectives:
  1. Setting a global open data for sustainable development agenda;
  2. Supporting governments in their open data initiatives;
  3. Scaling data solutions for sustainable development;
  4. Monitoring the availability, use and impact of open data around the world; and
  5. Building the institutional capacity and long-term sustainability of the Open Data for Development network.
  • The report identifies four barriers to impact in developing countries: the lack of capacity and leadership; the lack of evidence of what works; the lack of coordination between actors; and the lack of quality data.

Stuart, Elizabeth, Emma Samman, William Avis, Tom Berliner, “The Data Revolution: Finding the Missing Millions,” Open Data Institute Research Report, April 2015, http://bit.ly/2acnZtE.

  • This report examines the challenge of implementing successful development initiatives when many citizens are not known to their governments as they do not exist in official databases.
  • The authors argue that “good quality, relevant, accessible and timely data will allow willing governments to extend services into communities which until now have been blank spaces in planning processes, and to implement policies more efficiently.”
  • In addition to improvements to national statistical offices, the authors argue that “making better use of the data we already have” by increasing openness to certain datasets held by governments and international organizations could help to improve the situation.
  • They examine a number of open data efforts in developing countries, including Kenya and Mexico.
  • Finally, they argue that “the data revolution could play a role in changing the power dynamic between citizens, governments and the private sector, building on open data and freedom of information movements around the world. It has the potential to enable people to produce, access and understand information about their lives and to use this information to make changes.”

United Nations Independent Expert Advisory Group on a Data Revolution for Sustainable Development. “A World That Counts, Mobilizing the Data Revolution,” 2014, http://bit.ly/2am5K28.

  • This report focuses on the potential benefits and risks data holds for sustainable development. Included in this is a strategic framework for using and managing data for humanitarian purposes. It describes a need for a multinational consensus to be developed to ensure data is shared effectively and efficiently.
  • It suggests that “people who are counted”—i.e., those who are included in data collection processes—have better development outcomes and a better chance for humanitarian response in emergency or conflict situations.
  • In particular, “better and more open data” is described as having the potential to “save money and create economic, social and environmental value” toward sustainable development ends.

The World Bank, “Digital Dividends: World Development Report 2016.”http://bit.ly/2aG9Kx5

  • This report examines “digital dividends” or the development benefits of using digital technologies in the space.
  • The authors argue that: “To get the most out of the digital revolution, countries also need to work on the “analog complements”—by strengthening regulations that ensure competition among businesses, by adapting workers’ skills to the demands of the new economy, and by ensuring that institutions are accountable.”
  • The “data revolution,” which includes both big data and open data is listed as one of four “digital enablers.”
  • Open data’s impacts are explored across a number of cases and developing countries and regions, including: Nepal, Mexico, Southern Africa, Kenya, Moldova and the Philippines.
  • Despite a number of success stories, the authors argue that: “sustained, impactful, scaled-up examples of big and open data in the developing world are still relatively rare,” and, in particular, “Open data has far to go.” They point to the high correlation between readiness, implementation and impact of open data to GDP per capita as evidence of the room for improvement.

Open Data and Open Development

Reilly, Katherine and Juan P. Alperin, “Intermediation in Open Development: A Knowledge Stewardship Approach,”Global Media Journal (Canadian Edition), 2016, http://bit.ly/2atWyI8

  • This paper examines the intermediaries that “have emerged to facilitate open data and related knowledge production activities in development processes.”
  • In particular, they study the concept of “knowledge stewardship,” which “demands careful consideration of how—through what arrangements—open resources can best be provided, and how best to maximize the quality, sustainability, buy-in, and uptake of those resources.”
  • The authors describe five models of open data intermediation:
    • Decentralized
    • Arterial
    • Ecosystem
    • Bridging
    • Communities of practice

Reilly, Katherine and Rob McMahon, “Quality of openness: Evaluating the contributions of IDRC’s Information and Networks Program to open development.” International Development Research Centre, January 2015, http://bit.ly/2aD6h0U

  • This reports describes the outcomes of IRDC’s Information and Networks (I&N) programme, focusing, in particular, those related to “quality of openness” of initiatives as well as their outcomes.
  • The research program explores “mechanisms that link open initiatives to human activities in ways that generate social innovations of significance to development. These include push factors such as data holders’ understanding of data usage, the preparedness or acceptance of user communities, institutional policies, and wider policies and regulations; as well as pull factors including the awareness, capacity and attitude of users. In other words, openly networked social processes rely on not just quality openness, but also on supportive environments that link open resources and the people who might leverage them to create improvements, whether in governance, education or knowledge production.”

Smith, M. and L. Elder, “Open ICT Ecosystems Transforming the Developing World,” Information Technologies and International Development, 2010, http://bit.ly/2au0qsW.

  • The paper seeks to examine the hypothesis that “open social arrangements, enabled by ICTs, can help to catalyze the development impacts of ICTs. In other words, open ICT ecosystems provide the space for the amplification and transformation of social activities that can be powerful drivers of development.”
  • While the focus is placed on a number of ICT interventions – with open data only directly referenced as it relates to the science community – the lessons learned and overarching framework are applicable to the open data for development space.
  • The authors argue for a new research focus on “the new social activities enabled by different configurations of ICT ecosystems and their connections with particular social outcomes.” They point in particular to “modules of social practices that can be applied to solve similar problems across different development domains,” including “massive participation, collaborative production of content, collaborative innovation, collective information validation, new ‘open’ organizational models, and standards and knowledge transfer.”

Smith, Matthew and Katherine M. A. Reilly (eds), “Open Development: Networked Innovations in International Development,” MIT Press, 2013, http://bit.ly/2atX2hu.

  • This edited volume considers the implications of the emergence of open networked models predicated on digital network technologies for development. In their introduction, the editors emphasize that openness is a means to support development, not an end, which is layered upon existing technological and social structures. While openness is often disruptive, it depends upon some measure of closedness and structure in order to function effectively.
  • Subsequent, separately authored chapters provide case studies of open development drawn from health, biotechnology, and education, and explore some of the political and structural barriers faced by open models.  

van den Broek, Tijs, Marijn Rijken, Sander van Oort, “Towards Open Development Data: A review of open development data from a NGO perspective,” 2012, http://bit.ly/2ap5E8a

  • In this paper, the authors seek to answer the question: “What is the status, potential and required next steps of open development data from the perspective of the NGOs?”
  • They argue that “the take-up of open development data by NGOs has shown limited progress in the last few years,” and, offer “several steps to be taken before implementation” to increase the effectiveness of open data’s use by NGOs to improve development efforts:
    • Develop a vision on open development and open data
    • Develop a clear business case
    • Research the benefits and risks of open development data and raise organizational and political awareness and support
    • Develop an appealing business model for data intermediaries and end-users
    • Balance data quality and timeliness
    • Dealing with the data obesity
    • Enrich quantitative data to overcome a quantitative bias
    • Monitor implementation and share best practices.

Open Data and Development Goals

Berdou, Evangelia, “Mediating Voices and Communicating Realities: Using Information Crowdsourcing Tools, Open Data Initiatives and Digital Media to Support and Protect the Vulnerable and Marginalised,” Institute of Development Studies, 2011, http://bit.ly/2aqbycg.

  • This report examines the potential of “open source information crowdsourcing platforms like Ushahidi, and open mapping and data initiatives like OpenStreetMap, are enabling citizens in developing countries to generate and disseminate information critical for their lives and livelihoods.”
  • The authors focus in particular on:
    • “the role of the open source social entrepreneur as a new development actor
    • the complexity of the architectures of participation supported by these platforms and the need to consider them in relation to the decision-making processes that they aim to support and the roles in which they cast citizens
    • the possibilities for cross-fertilisation of ideas and the development of new practices between development practitioners and technology actors committed to working with communities to improve lives and livelihoods.”
  • While the use of ICTs and open data pose numerous potential benefits for supporting and protecting the vulnerable and marginalised, the authors call for greater attention to:
    • challenges emerging from efforts to sustain participation and govern the new information commons in under-resourced and politically contested spaces
    • complications and risks emerging from the desire to share information freely in such contexts
    • gaps between information provision, transparency and accountability, and the slow materialisation of projects’ wider social benefits

Canares, Michael, Satyarupa Shekhar, “Open Data and Sub-national Governments: Lessons from Developing Countries,”  2015, http://bit.ly/2au2gu2

  • This synthesis paper seeks to gain a greater understanding of open data’s effects on local contexts – ”where data is collected and stored, where there is strong feasibility that data will be published, and where data can generate the most use and impact” – through the examination of nine papers developed as part of the Open Data in Developing Countries research project.
  • The authors point to three central findings:
    • “There is substantial effort on the part of sub-national governments to proactively disclose data, however, the design delimits citizen participation, and eventually, use.”
    • Context demands different roles for intermediaries and different types of initiatives to create an enabling environment for open data.”
    • “Data quality will remain a critical challenge for sub-national governments in developing countries and it will temper potential impact that open data will be able to generate.

Davies, Tim, “Open Data in Developing Countries – Emerging Insights from Phase I,” ODDC, 2014, http://bit.ly/2aX55UW

  • This report synthesizes findings from the Exploring the Emerging Impacts of Open Data in Developing Countries (ODDC) research network and its study of open data initiatives in 13 countries.
  • Davies provides 15 initial insights across the supply, mediation, and use of open data, including:
    • Open data initiatives can create new spaces for civil society to pursue government accountability and effectiveness;
    • Intermediaries are vital to both the supply and the use of open data; and
    • Digital divides create data divides in both the supply and use of data.

Davies, Tim, Duncan Edwards, “Emerging Implications of Open and Linked Data for Knowledge Sharing Development,” IDS Bulletin, 2012, http://bit.ly/2aLKFyI

  • This article explores “issues that development sector knowledge intermediaries may need to engage with to ensure the socio-technical innovations of open and linked data work in the interests of greater diversity and better development practice.”
  • The authors explore a number of case studies where open and linked data was used in a development context, including:
    • Open research: IDS and R4D meta-data
    • Open aid: International Aid Transparency Initiative
    • Open linked statistics: Young Lives
  • Based on lessons learned from these cases, the authors argue that “openness must serve the interests of marginalised and poor people. This is pertinent at three levels:
    • practices in the publication and communication of data
    • capacities for, and approaches to, the use of data
    • development and emergent structuring of open data ecosystems.

Davies, Tim, Fernando Perini, and Jose Alonso, “Researching the Emerging Impacts of Open Data,” ODDC, 2013, http://bit.ly/2aqb6uP

  • This research report offers a conceptual framework for open data, with a particular focus on open data in developing countries.
  • The conceptual framework comprises three central elements:
    • Open Data
      • About government
      • About companies & markets
      • About citizens
    • Domains of governance
      • Political domains
      • Economic domains
      • Social domains
    • Emerging Outcomes
      • Transparency & accountability
      • Innovation & economic growth
      • Inclusion & empowerment
  • The authors describe three central theories of change related to open data’s impacts:
    • Open data will bring about greater transparency in government, which in turn brings about greater accountability of key actors to make decisions and apply rules in the public interest;
    • Open data will enable non-state innovators to improve public services or build innovative products and services with social and economic value; open data will shift certain decision making from the state into the market, making it more efficient;
    • Open data will remove power imbalances that resulted from asymmetric information, and will bring new stakeholders into policy debates, giving marginalised groups a greater say in the creation and application of rules and policy.

Montano, Elise and Diogo Silva, “Exploring the Emerging Impacts of Open Data in Developing Countries (ODDC): ODDC1 Follow-up Outcome Evaluation Report,” ODDC, 2016, http://bit.ly/2au65z7.

  • This report summarizes the findings of a two and a half year research-driven project sponsored by the World Wide Web Foundation to explore how open data improves governance in developing countries, and build capacity in these countries to engage with open data. The research was conducted through 17 subgrants to partners from 12 countries.
  • Upon evaluation in 2014, partners reported increased capacity and expertise in dealing with open data; empowerment in influencing local and regional open data trends, particularly among CSOs; and increased understanding of open data among policy makers with whom the partners were in contact.

Smith, Fiona, William Gerry, Emma Truswell, “Supporting Sustainable Development with Open Data,” Open Data Institute, 2015, http://bit.ly/2aJwxsF

  • This report describes the potential benefits, challenges and next steps for leveraging open data to advance the Sustainable Development Goals.
  • The authors argue that the greatest potential impacts of open data on development are:
    • More effectively target aid money and improve development programmes
    • Track development progress and prevent corruption
    • Contribute to innovation, job creation and economic growth.
  • They note, however, that many challenges to such impact exist, including:
    • A weak enabling environment for open data publishing
    • Poor data quality
    • A mismatch between the demand for open data and the supply of appropriate datasets
    • A ‘digital divide’ between rich and poor, affecting both the supply and use of data
    • A general lack of quantifiable data and metrics.
  • The report articulates a number of ways that “governments, donors and (international) NGOs – with the support of researchers, civil society and industry – can apply open data to help make the SDGs a reality:
    • Reach global consensus around principles and standards, namely being ‘open by default’, using the Open Government Partnership’s Open Data Working Group as a global forum for discussion.
    • Embed open data into funding agreements, ensuring that relevant, high-quality data is collected to report against the SDGs. Funders should mandate that data relating to performance of services, and data produced as a result of funded activity, be released as open data.
    • Build a global partnership for sustainable open data, so that groups across the public and private sectors can work together to build sustainable supply and demand for data in the developing world.”

The World Bank, “Open Data for Sustainable Development,” Policy Note, August 2015, http://bit.ly/2aGjaJ4

  • This report from the World Bank seeks to describe open data’s potential for achieving the Sustainable Development Goals, and makes a number of recommendations toward that end.
  • The authors describe four key benefits of open data use for developing countries:
    • Foster economic growth and job creation
    • Improve efficiency, effectiveness and coverage of public services
    • Increase transparency, accountability, and citizen participation
    • Facilitate better information sharing within government
  • The paper concludes with a number of recommendations for improving open data programs, including:
    • Support Open Data use through legal and licensing frameworks.
    • Make data available for free online.
    • Publish data inventories for the government’s data resources.
    • Create feedback channels to government from current and potential data users.
    • Prioritize the datasets that users want.

Open Data and Developing Countries (National Case Studies)

Beghin, Nathalie and Carmela Zigoni, “Measuring Open Data’s Impact on Brazilian National and Sub-National Budget Transparency Websites and Its Impacts on People’s Rights,” 2014, http://bit.ly/2au3LaQ.

  • This report examines the impact of a Brazilian law requiring government entities to “provide real-time information on their budgets and spending through electronic means.” The authors explore “whether the national and state capitals are in fact using principles and practices of open data in their disclosures, and has evaluated the emerging impacts of open budget data disclosed through the national transparency portal.”
  • The report leveraged a “quantitative survey of budget and financial disclosures, and qualitative research with key stakeholders” to explore the “role of technical platforms and intermediaries in supporting the use of budget data by groups working in pursuit of social change and human rights.”
  • The survey found that:
    • The information provided is complete
    • In general, the data are not primary
    • Most governments do not provide timely information
    • Access to information is not ensured to all individuals
    • Advances were observed in terms of the availability of machine-processable data
    • Access is free, without discriminating users
    • The minority presents data in non-proprietary format
    • It is not known whether the data are under license

Boyera, S., C. Iglesias, “Open Data in Developing Countries: State of the Art,” Partnership for Open Data, 2014, http://bit.ly/2acBMR7

  • This report provides a summary of the State-of-the-Art study developed by SBC4D for the Partnership for Open Data (POD).
  • A series of interviews and responses to an online questionnaire yielded a number of findings, including:
    • “The number of actors interested in Open Data in Developing Countries is growing quickly. The study has identified 160+ organizations. It is important to note that a majority of them are just engaging in the domain and have little past experience. Most of these actors are focused on OD as an objective not a tool or means to increase impact or outcome.
    • Local actors are strong advocates of public data release. Lots of them are also promoting the re-use of existing data (through e.g. the organization of training, hackathons and alike). However, the study has not identified many actors practically using OD in their work or engaged in releasing their own data.
    • Traditional development sectors (health, education, agriculture, energy, transport) are not yet the target of many initiatives, and are clearly underdeveloped in terms of use-cases.
    • There is very little connection between horizontal (e.g. national OD initiatives) and vertical (sector-specific initiatives on e.g. extractive industry, or disaster management) activities”

Canares, M.P., J. de Guia, M. Narca, J. Arawiran, “Opening the Gates: Will Open Data Initiatives Make Local Governments in the Philippines More Transparent?” Open LGU Research Project, 2014, http://bit.ly/2au3Ond

  • This paper seeks to determine the impacts of the Department of Interior and Local Government of the Philippines’ Full Disclosure Policy, affecting financial and procurement data, on both data providers and data users.
  • The paper uncovered two key findings:
    • “On the supply side, incentivising openness is a critical aspect in ensuring that local governments have the interest to disclose financial data. While at this stage, local governments are still on compliance behaviour, it encourages the once reluctant LGUs to disclose financial information in the use of public funds, especially when technology and institutional arrangements are in place. However, LGUs do not make an effort to inform the public that information is available online and has not made data accessible in such a way that it can allow the public to perform computations and analysis. Currently, no data standards have been made yet by the Philippine national government in terms of format and level of detail.”
    • “On the demand side, there is limited awareness on the part of the public, and more particularly the intermediaries (e.g. business groups, civil society organizations, research institutions), on the availability of data, and thus, its limited use. As most of these data are financial in nature, it requires a certain degree of competence and expertise so that they will be able to make use of the data in demanding from government better services and accountability.”
  • The authors argue that “openness is not just about governments putting meaningful government data out into the public domain, but also about making the public meaningfully engage with governments through the use of open government data.” In order to do that, policies should “require observance of open government data standards and a capacity building process of ensuring that the public, to whom the data is intended, are aware and able to use the data in ensuring more transparent and accountable governance.”

Canares, M., M. Narca, and D. Marcial, “Enhancing Citizen Engagement Through Open Government Data,” ODDC, 2015, http://bit.ly/2aJMhfS

  • This research paper seeks to gain a greater understanding of how civil society organizations can increase or initiate their use of open data. The study is based on research conducted in “two provinces in the Philippines where civil society organizations in Negros Oriental province were trained, and in the Bohol province were mentored on accessing and using open data.
  • The authors seek to answer three central research questions:
    • What do CSOs know about open government data? What do they know about government data that their local governments are publishing in the web?
    • What do CSOs have in terms of skills that would enable them to engage meaningfully with open government data?
    • How best can capacity building be delivered to civil society organizations to ensure that they learn to access and use open government data to improve governance?
  • They provide a number of key lessons, including:
    • Baseline condition should inform capacity building approach
    • Data use is dependent on data supply
    • Open data requires accessible and stable internet connection
    • Open data skills are important but insufficient
    • Outcomes, and not just outputs, prove capacity improvements

Chattapadhyay, Sumandro, “Opening Government Data through Mediation: Exploring the Roles, Practices and Strategies of Data Intermediary Organisations in India,” ODDC, 2014, http://bit.ly/2au3F37

  • This report seeks to gain a greater understanding of the current practice following the Government of India’s 2012 National Data Sharing and Accessibility Policy.
  • Cattapadhyay examines the open government data practices of “various (non-governmental) ‘data intermediary organisations’ on the one hand, and implementation challenges faced by managers of the Open Government Data Platform of India on the other.
  • The report’s objectives are:
    • To undertake a provisional mapping of government data related activities across different sectors to understand the nature of the “open data community” in India,
    • To enrich government data/information policy discussion in India by gathering evidence and experience of (non­governmental) data intermediaries regarding their actual practices of accessing and sharing government data, and their utilisation of the provisions of NDSAP and RTI act, and
    • To critically reflect on the nature of open data practices in India.

Chiliswa, Zacharia, “Open Government Data for Effective Public Participation: Findings of a Case Study Research Investigating The Kenya’s Open Data Initiative in Urban Slums and Rural Settlements,” ODDC, April 2014, http://bit.ly/2au8E4s

  • This research report is the product of a study of two urban slums and a rural settlement in Nairobi, Mobasa and Isiolo County, respectively, aimed at gaining a better understanding of the awareness and use of Kenya’s open data.
  • The study had four organizing objectives:
    • “Investigate the impact of the Kenyan Government’s open data initiative and to see whether, and if so how, it is assisting marginalized communities and groups in accessing key social services and information such as health and education;
    • Understand the way people use the information provided by the Open Data Initiative;
    • Identify people’s trust in the information and how it can assist their day-to-day lives;
    • Examine ways in which the public wish for the open data initiative to improve, particularly in relation to governance and service delivery.”
  • The study uncovered four central findings about Kenya’s open data initiative:
    • “There is a mismatch between the data citizens want to have and the data the Kenya portal and other intermediaries have provided.
    • Most people go to local information intermediaries instead of going directly to the government data portals and that there are few connections between these intermediaries and the wider open data sources.
    • Currently the rural communities are much less likely to seek out government information.
    • The kinds of data needed to support service delivery in Kenya may be different from those needed in other places in the world.”

Lwanga-Ntale, Charles, Beatrice Mugambe, Bernard Sabiti, Peace Nganwa, “Understanding how open data could impact resource allocation for poverty eradication in Kenya and Uganda,” ODDC, 2014, http://bit.ly/2aHqYKi

  • This paper explores case studies from Uganda and Kenya to explore an open data movement seeking to address “age-old” issues including “transparency, accountability, equity, and the relevance, effectiveness and efficiency of governance.”
  • The authors focus both on the role “emerging open data processes in the two countries may be playing in promoting citizen/public engagement and the allocation of resources,” and the “possible negative impacts that may emerge due to the ‘digital divide’ between those who have access to data (and technology) and those who do not.
  • They offer a number of recommendations to the government of Uganda and Kenya that could be more broadly applicable, including:
    • Promote sector and cross sector specific initiatives that enable collaboration and transparency through different e-transformation strategies across government sectors and agencies.
    • Develop and champion the capacity to drive transformation across government and to advance skills in its institutions and civil service.

Sapkota, Krishna, “Exploring the emerging impacts of open aid data and budget data in Nepal,” Freedom Forum, August 2014, http://bit.ly/2ap0z5G

  • This research report seeks to answer a five key questions regarding the opening of aid and budget data in Nepal:
    • What is the context for open aid and budget data in Nepal?
    • What sorts of budget and aid information is being made available in Nepal?
    • What is the governance of open aid and budget data in Nepal?
    • How are relevant stakeholders making use of open aid and budget data in Nepal?
    • What are the emerging impacts of open aid and budget data in Nepal?
  • The study uncovered a number of findings, including
    • “Information and data can play an important role in addressing key social issues, and that whilst some aid and budget data is increasingly available, including in open data formats, there is not yet a sustainable supply of open data direct from official sources that meet the needs of the different stakeholders we consulted.”
    • “Expectations amongst government, civil society, media and private sector actors that open data could be a useful resource in improving governance, and we found some evidence of media making use of data to drive stories more when they had the right skills, incentives and support.”
    • “The context of Nepal also highlights that a more critical perspective may be needed on the introduction of open data, understanding the specific opportunities and challenges for open data supply and use in a country that is currently undergoing a period of constitutional development, institution building and deepening democracy.”

Srivastava, Nidhi, Veena Agarwal, Anmol Soni, Souvik Bhattacharjya, Bibhu P. Nayak, Harsha Meenawat, Tarun Gopalakrishnan, “Open government data for regulation of energy resources in India,”ODDC, 2014, http://bit.ly/2au9oXf

  • This research paper examines “the availability, accessibility and use of open data in the extractive energy industries sector in India.”
  • The authors describe a number of challenges being faced by:
    • Data suppliers and intermediaries:
      • Lack of clarity on mandate
      • Agency specific issues
      • Resource challenges
      • Privacy issues of commercial data and contractual constraints
      • Formats for data collection
      • Challenges in providing timely data
      • Recovery of costs and pricing of data
    • Data users
      • Data available but inaccessible
      • Data accessible but not usable
      • Timeliness of data
  • They make a number of recommendations for addressing these challenges focusing on:
    • Policy measures
    • Improving data quality
    • Improving effectiveness of data portal

van Schalkwyk, François, Michael Caňares, Sumandro Chattapadhyay and Alexander Andrason “Open Data Intermediaries in Developing Countries,” ODDC, 2015, http://bit.ly/2aJztWi

  • This paper seeks to provide “a more socially nuanced approach to open data intermediaries,” moving beyond the traditional approach wherein data intermediaries are “presented as single and simple linkages between open data supply and use.”
  • The study’s analysis draws on cases from the Emerging Impacts of Open Data in Developing Countries (ODDC) project.
  • The authors provide a working definition of open data intermediaries: An open data intermediary is an agent:
    • positioned at some point in a data supply chain that incorporates an open dataset,
    • positioned between two agents in the supply chain, and
    • facilitates the use of open data that may otherwise not have been the case.
  • One of the studies key findings is that, “Intermediation does not only consist of a single agent facilitating the flow of data in an open data supply chain; multiple intermediaries may operate in an open data supply chain, and the presence of multiple intermediaries may increase the probability of use (and impact) because no single intermediary is likely to possess all the types of capital required to unlock the full value of the transaction between the provider and the user in each of the fields in play.”

van Schalkwyk, François, Michelle Willmers and Tobias Schonwetter, “Embedding Open Data Practice,” ODDC, 2015, http://bit.ly/2aHt5xu

  • This research paper was developed as part of the ODDC Phase 2 project and seeks to address the “insufficient attention paid to the institutional dynamics within governments and how these may be impeding open data practice.”
  • The study focuses in particular on open data initiatives in South Africa and Kenya, leveraging a conceptual framework to allow for meaningful comparison between the two countries.
  • Focusing on South Africa and Kenya, as well as Africa as a whole, the authors seek to address four central research questions:
    • Is open data practice being embedded in African governments?
    • What are the possible indicators of open data practice being embedded?
    • What do the indicators reveal about resistance to or compliance with pressures to adopt open data practice?
    • What are different effects of multiple institutional domains that may be at play in government as an organisation?

van Schalkwyk, Francois, Michelle Willmers, and Laura Czerniewicz, “Case Study: Open Data in the Governance of South African Higher Education,” ODDC, 2014, http://bit.ly/2amgIFb

  • This research report uses the South African Centre for Higher Education Transformation (CHET) open data platform as a case study to examine “the supply of and demand for open data as well as the roles of intermediaries in the South African higher education governance ecosystem.
  • The report’s findings include:
    • “There are concerns at both government and university levels about how data will be used and (mis)interpreted, and this may constrain future data supply. Education both at the level of supply (DHET) and at the level of use by the media in particular on how to improve the interpretability of data could go some way in countering current levels of mistrust. Similar initiatives may be necessary to address uneven levels of data use and trust apparent across university executives and councils.”
    • “Open data intermediaries increase the accessibility and utility of data. While there is a rich publicly-funded dataset on South African higher education, the data remains largely inaccessible and unusable to universities and researchers in higher education studies. Despite these constraints, the findings show that intermediaries in the ecosystem are playing a valuable role in making the data both available and useable.”
    • “Open data intermediaries provide both supply-side as well as demand-side value. CHET’s work on higher education performance indicators was intended not only to contribute to government’s steering mechanisms, but also to contribute to the governance capacity of South African universities. The findings support the use of CHET’s open data to build capacity within universities. Further research is required to confirm the use of CHET data in state-steering of the South African higher education system, although there is some evidence of CHET’s data being referenced in national policy documents.”

Verhulst, Stefaan and Andrew Young, “Open Data Impact: When Demand Supply Meet,” The GovLab, 2016, http://bit.ly/1LHkQPO

Additional Resource:

World Bank Readiness Assessment Tool