Creating a Machine Learning Commons for Global Development


Blog by Hamed Alemohammad: “Advances in sensor technology, cloud computing, and machine learning (ML) continue to converge to accelerate innovation in the field of remote sensing. However, fundamental tools and technologies still need to be developed to drive further breakthroughs and to ensure that the Global Development Community (GDC) reaps the same benefits that the commercial marketplace is experiencing. This process requires us to take a collaborative approach.

Data collaborative innovation — that is, a group of actors from different data domains working together toward common goals — might hold the key to finding solutions for some of the global challenges that the world faces. That is why Radiant.Earth is investing in new technologies such as Cloud Optimized GeoTiffsSpatial Temporal Asset Catalogues (STAC), and ML. Our approach to advance ML for global development begins with creating open libraries of labeled images and algorithms. This initiative and others require — and, in fact, will thrive as a result of — using a data collaborative approach.

“Data is only as valuable as the decisions it enables.”

This quote by Ion Stoica, professor of computer science at the University of California, Berkeley, may best describe the challenge facing those of us who work with geospatial information:

How can we extract greater insights and value from the unending tsunami of data that is before us, allowing for more informed and timely decision making?…(More).

UK can lead the way on ethical AI, says Lords Committee


Lords Select Committee: “The UK is in a strong position to be a world leader in the development of artificial intelligence (AI). This position, coupled with the wider adoption of AI, could deliver a major boost to the economy for years to come. The best way to do this is to put ethics at the centre of AI’s development and use concludes a report by the House of Lords Select Committee on Artificial Intelligence, AI in the UK: ready, willing and able?, published today….

One of the recommendations of the report is for a cross-sector AI Code to be established, which can be adopted nationally, and internationally. The Committee’s suggested five principles for such a code are:

  1. Artificial intelligence should be developed for the common good and benefit of humanity.
  2. Artificial intelligence should operate on principles of intelligibility and fairness.
  3. Artificial intelligence should not be used to diminish the data rights or privacy of individuals, families or communities.
  4. All citizens should have the right to be educated to enable them to flourish mentally, emotionally and economically alongside artificial intelligence.
  5. The autonomous power to hurt, destroy or deceive human beings should never be vested in artificial intelligence.

Other conclusions from the report include:

  • Many jobs will be enhanced by AI, many will disappear and many new, as yet unknown jobs, will be created. Significant Government investment in skills and training will be necessary to mitigate the negative effects of AI. Retraining will become a lifelong necessity.
  • Individuals need to be able to have greater personal control over their data, and the way in which it is used. The ways in which data is gathered and accessed needs to change, so that everyone can have fair and reasonable access to data, while citizens and consumers can protect their privacy and personal agency. This means using established concepts, such as open data, ethics advisory boards and data protection legislation, and developing new frameworks and mechanisms, such as data portability and data trusts.
  • The monopolisation of data by big technology companies must be avoided, and greater competition is required. The Government, with the Competition and Markets Authority, must review the use of data by large technology companies operating in the UK.
  • The prejudices of the past must not be unwittingly built into automated systems. The Government should incentivise the development of new approaches to the auditing of datasets used in AI, and also to encourage greater diversity in the training and recruitment of AI specialists.
  • Transparency in AI is needed. The industry, through the AI Council, should establish a voluntary mechanism to inform consumers when AI is being used to make significant or sensitive decisions.
  • At earlier stages of education, children need to be adequately prepared for working with, and using, AI. The ethical design and use of AI should become an integral part of the curriculum.
  • The Government should be bold and use targeted procurement to provide a boost to AI development and deployment. It could encourage the development of solutions to public policy challenges through speculative investment. There have been impressive advances in AI for healthcare, which the NHS should capitalise on.
  • It is not currently clear whether existing liability law will be sufficient when AI systems malfunction or cause harm to users, and clarity in this area is needed. The Committee recommend that the Law Commission investigate this issue.
  • The Government needs to draw up a national policy framework, in lockstep with the Industrial Strategy, to ensure the coordination and successful delivery of AI policy in the UK….(More)”.

The Potential and Practice of Data Collaboratives for Migration


Essay by Stefaan Verhulst and Andrew Young in the Stanford Social Innovation Review: “According to recent United Nations estimates, there are globally about 258 million international migrants, meaning people who live in a country other than the one in which they were born; this represents an increase of 49 percent since 2000. Of those, 26 million people have been forcibly displaced across borders, having migrated either as refugees or asylum seekers. An additional 40 million or so people are internally displaced due to conflict and violence, and millions more are displaced each year because of natural disasters. It is sobering, then, to consider that, according to many observers, global warming is likely to make the situation worse.

Migration flows of all kinds—for work, family reunification, or political or environmental reasons—create a range of both opportunities and challenges for nation states and international actors. But the issues associated with refugees and asylum seekers are particularly complex. Despite the high stakes and increased attention to the issue, our understanding of the full dimensions and root causes of refugee movements remains limited. Refugee flows arise in response to not only push factors like wars and economic insecurity, but also powerful pull factors in recipient countries, including economic opportunities, and perceived goods like greater tolerance and rule of law. In addition, more objectively measurable variables like border barriers, topography, and even the weather, play an important role in determining the number and pattern of refugee flows. These push and pull factors interact in complex and often unpredictable ways. Further complicating matters, some experts argue that push-pull research on migration is dogged by a number of conceptual and methodological limitations.

To mitigate negative impacts and anticipate opportunities arising from high levels of global migration, we need a better understanding of the various factors contributing to the international movement of people and how they work together.

Data—specifically, the widely dispersed data sets that exist across governments, the private sector, and civil society—can help alleviate today’s information shortcoming. Several recent initiatives show the potential of using data to address some of the underlying informational gaps. In particular, there is an important role for a new form of data-driven problem-solving and policymaking—what we call “data collaboratives.” Data collaboratives offer the potential for inter-sectoral collaboration, and for the merging and augmentation of otherwise siloed data sets. While public and private actors are increasingly experimenting with various types of data in a variety of sectors and geographies—including sharing disease data to accelerate disease treatments and leveraging private bus data to improve urban planning—we are only beginning to understand the potential of data collaboration in the context of migration and refugee issues….(More)”.

 

…(More)”

Selected Readings on Data Responsibility, Refugees and Migration


By Kezia Paladina, Alexandra Shaw, Michelle Winowatan, Stefaan Verhulst, and Andrew Young

The Living Library’s Selected Readings series seeks to build a knowledge base on innovative approaches for improving the effectiveness and legitimacy of governance. This curated and annotated collection of recommended works on the topic of Data Collaboration for Migration was originally published in 2018.

Special thanks to Paul Currion whose data responsibility literature review gave us a headstart when developing the below. (Check out his article listed below on Refugee Identity)

The collection below is also meant to complement our article in the Stanford Social Innovation Review on Data Collaboration for Migration where we emphasize the need for a Data Responsibility Framework moving forward.

From climate change to politics to finance, there is growing recognition that some of the most intractable problems of our era are information problems. In recent years, the ongoing refugee crisis has increased the call for new data-driven approaches to address the many challenges and opportunities arising from migration. While data – including data from the private sector – holds significant potential value for informing analysis and targeted international and humanitarian response to (forced) migration, decision-makers often lack an actionable understanding of if, when and how data could be collected, processed, stored, analyzed, used, and shared in a responsible manner.

Data responsibility – including the responsibility to protect data and shield its subjects from harms, and the responsibility to leverage and share data when it can provide public value – is an emerging field seeking to go beyond just privacy concerns. The forced migration arena has a number of particularly important issues impacting responsible data approaches, including the risks of leveraging data regarding individuals fleeing a hostile or repressive government.

In this edition of the GovLab’s Selected Readings series, we examine the emerging literature on the data responsibility approaches in the refugee and forced migration space – part of an ongoing series focused on Data Responsibiltiy. The below reading list features annotated readings related to the Policy and Practice of data responsibility for refugees, and the specific responsibility challenges regarding Identity and Biometrics.

Data Responsibility and Refugees – Policy and Practice

International Organization for Migration (IOM) (2010) IOM Data Protection Manual. Geneva: IOM.

  • This IOM manual includes 13 data protection principles related to the following activities: lawful and fair collection, specified and legitimate purpose, data quality, consent, transfer to third parties, confidentiality, access and transparency, data security, retention and personal data, application of the principles, ownership of personal data, oversight, compliance and internal remedies (and exceptions).
  • For each principle, the IOM manual features targeted data protection guidelines, and templates and checklists are included to help foster practical application.

Norwegian Refugee Council (NRC) Internal Displacement Monitoring Centre / OCHA (eds.) (2008) Guidance on Profiling Internally Displaced Persons. Geneva: Inter-Agency Standing Committee.

  • This NRC document contains guidelines on gathering better data on Internally Displaced Persons (IDPs), based on country context.
  • IDP profile is defined as number of displaced persons, location, causes of displacement, patterns of displacement, and humanitarian needs among others.
  • It further states that collecting IDPs data is challenging and the current condition of IDPs data are hampering assistance programs.
  • Chapter I of the document explores the rationale for IDP profiling. Chapter II describes the who aspect of profiling: who IDPs are and common pitfalls in distinguishing them from other population groups. Chapter III describes the different methodologies that can be used in different contexts and suggesting some of the advantages and disadvantages of each, what kind of information is needed and when it is appropriate to profile.

United Nations High Commissioner for Refugees (UNHCR). Model agreement on the sharing of personal data with Governments in the context of hand-over of the refugee status determination process. Geneva: UNHCR.

  • This document from UNHCR provides a template of agreement guiding the sharing of data between a national government and UNHCR. The model agreement’s guidance is aimed at protecting the privacy and confidentiality of individual data while promoting improvements to service delivery for refugees.

United Nations High Commissioner for Refugees (UNHCR) (2015). Policy on the Protection of Personal Data of Persons of Concern to UNHCR. Geneva: UNHCR.

  • This policy outlines the rules and principles regarding the processing of personal data of persons engaged by UNHCR with the purpose of ensuring that the practice is consistent with UNGA’s regulation of computerized personal data files that was established to protect individuals’ data and privacy.
  • UNHCR require its personnel to apply the following principles when processing personal data: (i) Legitimate and fair processing (ii) Purpose specification (iii) Necessity and proportionality (iv) Accuracy (v) Respect for the rights of the data subject (vi) Confidentiality (vii) Security (viii) Accountability and supervision.

United Nations High Commissioner for Refugees (UNHCR) (2015) Privacy Impact Assessment of UNHCR Cash Based Interventions.

  • This impact assessment focuses on privacy issues related to financial assistance for refugees in the form of cash transfers. For international organizations like UNHCR to determine eligibility for cash assistance, data “aggregation, profiling, and social sorting techniques,” are often needed, leading a need for a responsible data approach.
  • This Privacy Impact Assessment (PIA) aims to identify the privacy risks posed by their program and seek to enhance safeguards that can mitigate those risks.
  • Key issues raised in the PIA involves the challenge of ensuring that individuals’ data will not be used for purposes other than those initially specified.

Data Responsibility in Identity and Biometrics

Bohlin, A. (2008) “Protection at the Cost of Privacy? A Study of the Biometric Registration of Refugees.” Lund: Faculty of Law of the University of Lund.

  • This 2008 study focuses on the systematic biometric registration of refugees conducted by UNHCR in refugee camps around the world, to understand whether enhancing the registration mechanism of refugees contributes to their protection and guarantee of human rights, or whether refugee registration exposes people to invasions of privacy.
  • Bohlin found that, at the time, UNHCR failed to put a proper safeguards in the case of data dissemination, exposing the refugees data to the risk of being misused. She goes on to suggest data protection regulations that could be put in place in order to protect refugees’ privacy.

Currion, Paul. (2018) “The Refugee Identity.” Medium.

  • Developed as part of a DFID-funded initiative, this essay considers Data Requirements for Service Delivery within Refugee Camps, with a particular focus on refugee identity.
  • Among other findings, Currion finds that since “the digitisation of aid has already begun…aid agencies must therefore pay more attention to the way in which identity systems affect the lives and livelihoods of the forcibly displaced, both positively and negatively.”
  • Currion argues that a Responsible Data approach, as opposed to a process defined by a Data Minimization principle, provides “useful guidelines,” but notes that data responsibility “still needs to be translated into organisational policy, then into institutional processes, and finally into operational practice.”

Farraj, A. (2010) “Refugees and the Biometric Future: The Impact of Biometrics on Refugees and Asylum Seekers.” Colum. Hum. Rts. L. Rev. 42 (2010): 891.

  • This article argues that biometrics help refugees and asylum seekers establish their identity, which is important for ensuring the protection of their rights and service delivery.
  • However, Farraj also describes several risks related to biometrics, such as, misidentification and misuse of data, leading to a need for proper approaches for the collection, storage, and utilization of the biometric information by government, international organizations, or other parties.  

GSMA (2017) Landscape Report: Mobile Money, Humanitarian Cash Transfers and Displaced Populations. London: GSMA.

  • This paper from GSMA seeks to evaluate how mobile technology can be helpful in refugee registration, cross-organizational data sharing, and service delivery processes.
  • One of its assessments is that the use of mobile money in a humanitarian context depends on the supporting regulatory environment that contributes to unlocking the true potential of mobile money. The examples include extension of SIM dormancy period to anticipate infrequent cash disbursements, ensuring that persons without identification are able to use the mobile money services, and so on.
  • Additionally, GMSA argues that mobile money will be most successful when there is an ecosystem to support other financial services such as remittances, airtime top-ups, savings, and bill payments. These services will be especially helpful in including displaced populations in development.

GSMA (2017) Refugees and Identity: Considerations for mobile-enabled registration and aid delivery. London: GSMA.

  • This paper emphasizes the importance of registration in the context of humanitarian emergency, because being registered and having a document that proves this registration is key in acquiring services and assistance.
  • Studying cases of Kenya and Iraq, the report concludes by providing three recommendations to improve mobile data collection and registration processes: 1) establish more flexible KYC for mobile money because where refugees are not able to meet existing requirements; 2) encourage interoperability and data sharing to avoid fragmented and duplicative registration management; and 3) build partnership and collaboration among governments, humanitarian organizations, and multinational corporations.

Jacobsen, Katja Lindskov (2015) “Experimentation in Humanitarian Locations: UNHCR and Biometric Registration of Afghan Refugees.” Security Dialogue, Vol 46 No. 2: 144–164.

  • In this article, Jacobsen studies the biometric registration of Afghan refugees, and considers how “humanitarian refugee biometrics produces digital refugees at risk of exposure to new forms of intrusion and insecurity.”

Jacobsen, Katja Lindskov (2017) “On Humanitarian Refugee Biometrics and New Forms of Intervention.” Journal of Intervention and Statebuilding, 1–23.

  • This article traces the evolution of the use of biometrics at the Office of the United Nations High Commissioner for Refugees (UNHCR) – moving from a few early pilot projects (in the early-to-mid-2000s) to the emergence of a policy in which biometric registration is considered a ‘strategic decision’.

Manby, Bronwen (2016) “Identification in the Context of Forced Displacement.” Washington DC: World Bank Group. Accessed August 21, 2017.

  • In this paper, Bronwen describes the consequences of not having an identity in a situation of forced displacement. It prevents displaced population from getting various services and creates higher chance of exploitation. It also lowers the effectiveness of humanitarian actions, as lacking identity prevents humanitarian organizations from delivering their services to the displaced populations.
  • Lack of identity can be both the consequence and and cause of forced displacement. People who have no identity can be considered illegal and risk being deported. At the same time, conflicts that lead to displacement can also result in loss of ID during travel.
  • The paper identifies different stakeholders and their interest in the case of identity and forced displacement, and finds that the biggest challenge for providing identity to refugees is the politics of identification and nationality.
  • Manby concludes that in order to address this challenge, there needs to be more effective coordination among governments, international organizations, and the private sector to come up with an alternative of providing identification and services to the displaced persons. She also argues that it is essential to ensure that national identification becomes a universal practice for states.

McClure, D. and Menchi, B. (2015). Challenges and the State of Play of Interoperability in Cash Transfer Programming. Geneva: UNHCR/World Vision International.

  • This report reviews the elements that contribute to the interoperability design for Cash Transfer Programming (CTP). The design framework offered here maps out these various features and also looks at the state of the problem and the state of play through a variety of use cases.
  • The study considers the current state of play and provides insights about the ways to address the multi-dimensionality of interoperability measures in increasingly complex ecosystems.     

NRC / International Human Rights Clinic (2016). Securing Status: Syrian refugees and the documentation of legal status, identity, and family relationships in Jordan.

  • This report examines Syrian refugees’ attempts to obtain identity cards and other forms of legally recognized documentation (mainly, Ministry of Interior Service Cards, or “new MoI cards”) in Jordan through the state’s Urban Verification Exercise (“UVE”). These MoI cards are significant because they allow Syrians to live outside of refugee camps and move freely about Jordan.
  • The text reviews the acquirement processes and the subsequent challenges and consequences that refugees face when unable to obtain documentation. Refugees can encounter issues ranging from lack of access to basic services to arrest, detention, forced relocation to camps and refoulement.  
  • Seventy-two Syrian refugee families in Jordan were interviewed in 2016 for this report and their experiences with obtaining MoI cards varied widely.

Office of Internal Oversight Services (2015). Audit of the operations in Jordan for the Office of the United Nations High Commissioner for Refugees. Report 2015/049. New York: UN.

  • This report documents the January 1, 2012 – March 31, 2014 audit of Jordanian operations, which is intended to ensure the effectiveness of the UNHCR Representation in the state.
  • The main goals of the Regional Response Plan for Syrian refugees included relieving the pressure on Jordanian services and resources while still maintaining protection for refugees.
  • The audit results concluded that the Representation was initially unsatisfactory, and the OIOS suggested several recommendations according to the two key controls which the Representation acknowledged. Those recommendations included:
    • Project management:
      • Providing training to staff involved in financial verification of partners supervise management
      • Revising standard operating procedure on cash based interventions
      • Establishing ways to ensure that appropriate criteria for payment of all types of costs to partners’ staff are included in partnership agreements
    • Regulatory framework:
      • Preparing annual need-based procurement plan and establishing adequate management oversight processes
      • Creating procedures for the assessment of renovation work in progress and issuing written change orders
      • Protecting data and ensuring timely consultation with the UNHCR Division of Financial and Administrative Management

UNHCR/WFP (2015). Joint Inspection of the Biometrics Identification System for Food Distribution in Kenya. Geneva: UNHCR/WFP.

  • This report outlines the partnership between the WFP and UNHCR in its effort to promote its biometric identification checking system to support food distribution in the Dadaab and Kakuma refugee camps in Kenya.
  • Both entities conducted a joint inspection mission in March 2015 and was considered an effective tool and a model for other country operations.
  • Still, 11 recommendations are proposed and responded to in this text to further improve the efficiency of the biometric system, including real-time evaluation of impact, need for automatic alerts, documentation of best practices, among others.

Making Better Use of Health Care Data


Benson S. Hsu, MD and Emily Griese in Harvard Business Review: “At Sanford Health, a $4.5 billion rural integrated health care system, we deliver care to over 2.5 million people in 300 communities across 250,000 square miles. In the process, we collect and store vast quantities of patient data – everything from admission, diagnostic, treatment and discharge data to online interactions between patients and providers, as well as data on providers themselves. All this data clearly represents a rich resource with the potential to improve care, but until recently was underutilized. The question was, how best to leverage it.

While we have a mature data infrastructure including a centralized data and analytics team, a standalone virtual data warehouse linking all data silos, and strict enterprise-wide data governance, we reasoned that the best way forward would be to collaborate with other institutions that had additional and complementary data capabilities and expertise.

We reached out to potential academic partners who were leading the way in data science, from university departments of math, science, and computer informatics to business and medical schools and invited them to collaborate with us on projects that could improve health care quality and lower costs. In exchange, Sanford created contracts that gave these partners access to data whose use had previously been constrained by concerns about data privacy and competitive-use agreements. With this access, academic partners are advancing their own research while providing real-world insights into care delivery.

The resulting Sanford Data Collaborative, now in its second year, has attracted regional and national partners and is already beginning to deliver data-driven innovations that are improving care delivery, patient engagement, and care access. Here we describe three that hold particular promise.

  • Developing Prescriptive Algorithms…
  • Augmenting Patient Engagement…
  • Improving Access to Care…(More)”.

Data Collaboratives can transform the way civil society organisations find solutions


Stefaan G. Verhulst at Disrupt & Innovate: “The need for innovation is clear: The twenty-first century is shaping up to be one of the most challenging in recent history. From climate change to income inequality to geopolitical upheaval and terrorism: the difficulties confronting International Civil Society Organisations (ICSOs) are unprecedented not only in their variety but also in their complexity. At the same time, today’s practices and tools used by ICSOs seem stale and outdated. Increasingly, it is clear, we need not only new solutions but new methods for arriving at solutions.

Data will likely become more central to meeting these challenges. We live in a quantified era. It is estimated that 90% of the world’s data was generated in just the last two years. We know that this data can help us understand the world in new ways and help us meet the challenges mentioned above. However, we need new data collaboration methods to help us extract the insights from that data.

UNTAPPED DATA POTENTIAL

For all of data’s potential to address public challenges, the truth remains that most data generated today is in fact collected by the private sector – including ICSOs who are often collecting a vast amount of data – such as, for instance, the International Committee of the Red Cross, which generates various (often sensitive) data related to humanitarian activities. This data, typically ensconced in tightly held databases toward maintaining competitive advantage or protecting from harmful intrusion, contains tremendous possible insights and avenues for innovation in how we solve public problems. But because of access restrictions and often limited data science capacity, its vast potential often goes untapped.

DATA COLLABORATIVES AS A SOLUTION

Data Collaboratives offer a way around this limitation. They represent an emerging public-private partnership model, in which participants from different areas — including the private sector, government, and civil society — come together to exchange data and pool analytical expertise.

While still an emerging practice, examples of such partnerships now exist around the world, across sectors and public policy domains. Importantly several ICSOs have started to collaborate with others around their own data and that of the private and public sector. For example:

  • Several civil society organisations, academics, and donor agencies are partnering in the Health Data Collaborative to improve the global data infrastructure necessary to make smarter global and local health decisions and to track progress against the Sustainable Development Goals (SDGs).
  • Additionally, the UN Office for the Coordination of Humanitarian Affairs (UNOCHA) built Humanitarian Data Exchange (HDX), a platform for sharing humanitarian from and for ICSOs – including Caritas, InterAction and others – donor agencies, national and international bodies, and other humanitarian organisations.

These are a few examples of Data Collaboratives that ICSOs are participating in. Yet, the potential for collaboration goes beyond these examples. Likewise, so do the concerns regarding data protection and privacy….(More)”.

How the Data That Internet Companies Collect Can Be Used for the Public Good


Stefaan G. Verhulst and Andrew Young at Harvard Business Review: “…In particular, the vast streams of data generated through social media platforms, when analyzed responsibly, can offer insights into societal patterns and behaviors. These types of behaviors are hard to generate with existing social science methods. All this information poses its own problems, of complexity and noise, of risks to privacy and security, but it also represents tremendous potential for mobilizing new forms of intelligence.

In a recent report, we examine ways to harness this potential while limiting and addressing the challenges. Developed in collaboration with Facebook, the report seeks to understand how public and private organizations can join forces to use social media data — through data collaboratives — to mitigate and perhaps solve some our most intractable policy dilemmas.

Data Collaboratives: Public-Private Partnerships for Our Data Age 

For all of data’s potential to address public challenges, most data generated today is collected by the private sector. Typically ensconced in corporate databases, and tightly held in order to maintain competitive advantage, this data contains tremendous possible insights and avenues for policy innovation. But because the analytical expertise brought to bear on it is narrow, and limited by private ownership and access restrictions, its vast potential often goes untapped.

Data collaboratives offer a way around this limitation. They represent an emerging public-private partnership model, in which participants from different areas , including the private sector, government, and civil society , can come together to exchange data and pool analytical expertise in order to create new public value. While still an emerging practice, examples of such partnerships now exist around the world, across sectors and public policy domains….

Professionalizing the Responsible Use of Private Data for Public Good

For all its promise, the practice of data collaboratives remains ad hoc and limited. In part, this is a result of the lack of a well-defined, professionalized concept of data stewardship within corporations. Today, each attempt to establish a cross-sector partnership built on the analysis of social media data requires significant and time-consuming efforts, and businesses rarely have personnel tasked with undertaking such efforts and making relevant decisions.

As a consequence, the process of establishing data collaboratives and leveraging privately held data for evidence-based policy making and service delivery is onerous, generally one-off, not informed by best practices or any shared knowledge base, and prone to dissolution when the champions involved move on to other functions.

By establishing data stewardship as a corporate function, recognized within corporations as a valued responsibility, and by creating the methods and tools needed for responsible data-sharing, the practice of data collaboratives can become regularized, predictable, and de-risked.

If early efforts toward this end — from initiatives such as Facebook’s Data for Good efforts in the social media space and MasterCard’s Data Philanthropy approach around finance data — are meaningfully scaled and expanded, data stewards across the private sector can act as change agents responsible for determining what data to share and when, how to protect data, and how to act on insights gathered from the data.

Still, many companies (and others) continue to balk at the prospect of sharing “their” data, which is an understandable response given the reflex to guard corporate interests. But our research has indicated that many benefits can accrue not only to data recipients but also to those who share it. Data collaboration is not a zero-sum game.

With support from the Hewlett Foundation, we are embarking on a two-year project toward professionalizing data stewardship (and the use of data collaboratives) and establishing well-defined data responsibility approaches. We invite others to join us in working to transform this practice into a widespread, impactful means of leveraging private-sector assets, including social media data, to create positive public-sector outcomes around the world….(More)”.

 

Selected Readings on Data, Gender, and Mobility


By Michelle Winowatan, Andrew Young, and Stefaan Verhulst

The Living Library’s Selected Readings series seeks to build a knowledge base on innovative approaches for improving the effectiveness and legitimacy of governance. This curated and annotated collection of recommended works on the topic of data, gender, and mobility was originally published in 2017.

This edition of the Selected Readings was  developed as part of an ongoing project at the GovLab, supported by Data2X, in collaboration with UNICEF, DigitalGlobe, IDS (UDD/Telefonica R&D), and the ISI Foundation, to establish a data collaborative to analyze unequal access to urban transportation for women and girls in Chile. We thank all our partners for their suggestions to the below curation – in particular Leo Ferres at IDS who got us started with this collection; Ciro Cattuto and Michele Tizzoni from the ISI Foundation; and Bapu Vaitla at Data2X for their pointers to the growing data and mobility literature. 

Introduction

Daily mobility is key for gender equity. Access to transportation contributes to women’s agency and independence. The ability to move from place to place safely and efficiently can allow women to access education, work, and the public domain more generally. Yet, mobility is not just a means to access various opportunities. It is also a means to enter the public domain.

Women’s mobility is a multi-layered challenge
Women’s daily mobility, however, is often hampered by social, cultural, infrastructural, and technical barriers. Cultural bias, for instance, limits women mobility in a way that women are confined to an area with close proximity to their house due to society’s double standard on women to be homemakers. From an infrastructural perspective, public transportation mostly only accommodates home-to-work trips, when in reality women often make more complex trips with stops, for example, at the market, school, healthcare provider – sometimes called “trip chaining.” From a safety perspective, women tend to avoid making trips in certain areas and/or at certain time, due to a constant risk of being sexually harassed on public places. Women are also pushed toward more expensive transportation – such as taking a cab instead of a bus or train – based on safety concerns.

The growing importance of (new sources of) data
Researchers are increasingly experimenting with ways to address these interdependent problems through the analysis of diverse datasets, often collected by private sector businesses and other non-governmental entities. Gender-disaggregated mobile phone records, geospatial data, satellite imagery, and social media data, to name a few, are providing evidence-based insight into gender and mobility concerns. Such data collaboratives – the exchange of data across sectors to create public value – can help governments, international organizations, and other public sector entities in the move toward more inclusive urban and transportation planning, and the promotion of gender equity.
The below curated set of readings seek to focus on the following areas:

  1. Insights on how data can inform gender empowerment initiatives,
  2. Emergent research into the capacity of new data sources – like call detail records (CDRs) and satellite imagery – to increase our understanding of human mobility patterns, and
  3. Publications exploring data-driven policy for gender equity in mobility.

Readings are listed in alphabetical order.

We selected the readings based upon their focus (gender and/or mobility related); scope and representativeness (going beyond one project or context); type of data used (such as CDRs and satellite imagery); and date of publication.

Annotated Reading List

Data and Gender

Blumenstock, Joshua, and Nathan Eagle. Mobile Divides: Gender, Socioeconomic Status, and Mobile Phone Use in Rwanda. ACM Press, 2010.

  • Using traditional survey and mobile phone operator data, this study analyzes gender and socioeconomic divides in mobile phone use in Rwanda, where it is found that the use of mobile phones is significantly more prevalent in men and the higher class.
  • The study also shows the differences in the way men and women use phones, for example: women are more likely to use a shared phone than men.
  • The authors frame their findings around gender and economic inequality in the country to the end of providing pointers for government action.

Bosco, Claudio, et al. Mapping Indicators of Female Welfare at High Spatial Resolution. WorldPop and Flowminder, 2015.

  • This report focuses on early adolescence in girls, which often comes with higher risk of violence, fewer economic opportunity, and restrictions on mobility. Significant data gaps, methodological and ethical issues surrounding data collection for girls also create barriers for policymakers to create evidence-based policy to address those issues.
  • The authors analyze geolocated household survey data, using statistical models and validation techniques, and creates high-resolution maps of various sex-disaggregated indicators, such as nutrition level, access to contraception, and literacy, to better inform local policy making processes.
  • Further, it identifies the gender data gap and issues surrounding gender data collection, and provides arguments for why having a comprehensive data can help create better policy and contribute to the achievements of the Sustainable Development Goals (SDGs).

Buvinic, Mayra, Rebecca Furst-Nichols, and Gayatri Koolwal. Mapping Gender Data Gaps. Data2X, 2014.

  • This study identifies gaps in gender data in developing countries on health, education, economic opportunities, political participation, and human security issues.
  • It recommends ways to close the gender data gap through censuses and micro-level surveys, service and administrative records, and emphasizes how “big data” in particular can fill the missing data that will be able to measure the progress of women and girls well being. The authors argue that dentifying these gaps is key to advancing gender equality and women’s empowerment, one of the SDGs.

Catalyzing Inclusive FInancial System: Chile’s Commitment to Women’s Data. Data2X, 2014.

  • This article analyzes global and national data in the banking sector to fill the gap of sex-disaggregated data in Chile. The purpose of the study is to describe the difference in spending behavior and priorities between women and men, identify the challenges for women in accessing financial services, and create policies that promote women inclusion in Chile.

Ready to Measure: Twenty Indicators for Monitoring SDG Gender Targets. Open Data Watch and Data2X, 2016.

  • Using readily available data this study identifies 20 SDG indicators related to gender issues that can serve as a baseline measurement for advancing gender equality, such as percentage of women aged 20-24 who were married or in a union before age 18 (child marriage), proportion of seats held by women in national parliament, and share of women among mobile telephone owners, among others.

Ready to Measure Phase II: Indicators Available to Monitor SDG Gender Targets. Open Data Watch and Data2X, 2017.

  • The Phase II paper is an extension of the Ready to Measure Phase I above. Where Phase I identifies the readily available data to measure women and girls well-being, Phase II provides informations on how to access and summarizes insights from this data.
  • Phase II elaborates the insights about data gathered from ready to measure indicators and finds that although underlying data to measure indicators of women and girls’ wellbeing is readily available in most cases, it is typically not sex-disaggregated.
  • Over one in five – 53 out of 232 – SDG indicators specifically refer to women and girls. However, further analysis from this study reveals that at least 34 more indicators should be disaggregated by sex. For instance, there should be 15 more sex-disaggregated indicators for SDG number 3: “Ensure healthy lives and promote well-being for all at all ages.”
  • The report recommends national statistical agencies to take the lead and assert additional effort to fill the data gap by utilizing tools such as the statistical model to fill the current gender data gap for each of the SDGs.

Reed, Philip J., Muhammad Raza Khan, and Joshua Blumenstock. Observing gender dynamics and disparities with mobile phone metadata. International Conference on Information and Communication Technologies and Development (ICTD), 2016.

  • The study analyzes mobile phone logs of millions of Pakistani residents to explore whether there is a difference in mobile phone usage behavior between male and female and determine the extent to which gender inequality is reflected in mobile phone usage.
  • It utilizes mobile phone data to analyze the pattern of usage behavior between genders, and socioeconomic and demographic data obtained from census and advocacy groups to assess the state of gender equality in each region in Pakistan.
  • One of its findings is a strong positive correlation between proportion of female mobile phone users and education score.

Stehlé, Juliette, et al. Gender homophily from spatial behavior in a primary school: A sociometric study. 2013.

    • This paper seeks to understand homophily, a human behavior characterizes by interaction with peers who have similarities in “physical attributes to tastes or political opinions”. Further, it seeks to identify the magnitude of influence, a type of homophily has to social structures.
    • Focusing on gender interaction among primary school aged children in France, this paper collects data from wearable devices from 200 children in the period of 2 days and measure the physical proximity and duration of the interaction among those children in the playground.
  • It finds that interaction patterns are significantly determined by grade and class structure of the school. Meaning that children belonging to the same class have most interactions, and that lower grades usually do not interact with higher grades.
  • From a gender lens, this study finds that mixed-gender interaction lasts shorter relative to same-gender interaction. In addition, interaction among girls is also longer compared to interaction among boys. These indicate that the children in this school tend to have stronger relationships within their own gender, or what the study calls gender homophily. It further finds that gender homophily is apparent in all classes.

Data and Mobility

Bengtsson, Linus, et al. Using Mobile Phone Data to Predict the Spatial Spread of Cholera. Flowminder, 2015.

  • This study seeks to predict the 2010 cholera epidemic in Haiti using 2.9 million anonymous mobile phone SIM cards and reported cases of Cholera from the Haitian Directorate of Health, where 78 study areas were analyzed in the period of October 16 – December 16, 2010.
  • From this dataset, the study creates a mobility matrix that indicates mobile phone movement from one study area to another and combines that with the number of reported case of cholera in the study areas to calculate the infectious pressure level of those areas.
  • The main finding of its analysis shows that the outbreak risk of a study area correlates positively with the infectious pressure level, where an infectious pressure of over 22 results in an outbreak within 7 days. Further, it finds that the infectious pressure level can inform the sensitivity and specificity of the outbreak prediction.
  • It hopes to improve infectious disease containment by identifying areas with highest risks of outbreaks.

Calabrese, Francesco, et al. Understanding Individual Mobility Patterns from Urban Sensing Data: A Mobile Phone Trace Example. SENSEable City Lab, MIT, 2012.

  • This study compares mobile phone data and odometer readings from annual safety inspections to characterize individual mobility and vehicular mobility in the Boston Metropolitan Area, measured by the average daily total trip length of mobile phone users and average daily Vehicular Kilometers Traveled (VKT).
  • The study found that, “accessibility to work and non-work destinations are the two most important factors in explaining the regional variations in individual and vehicular mobility, while the impacts of populations density and land use mix on both mobility measures are insignificant.” Further, “a well-connected street network is negatively associated with daily vehicular total trip length.”
  • This study demonstrates the potential for mobile phone data to provide useful and updatable information on individual mobility patterns to inform transportation and mobility research.

Campos-Cordobés, Sergio, et al. “Chapter 5 – Big Data in Road Transport and Mobility Research.” Intelligent Vehicles. Edited by Felipe Jiménez. Butterworth-Heinemann, 2018.

  • This study outlines a number of techniques and data sources – such as geolocation information, mobile phone data, and social network observation – that could be leveraged to predict human mobility.
  • The authors also provide a number of examples of real-world applications of big data to address transportation and mobility problems, such as transport demand modeling, short-term traffic prediction, and route planning.

Lin, Miao, and Wen-Jing Hsu. Mining GPS Data for Mobility Patterns: A Survey. Pervasive and Mobile Computing vol. 12,, 2014.

  • This study surveys the current field of research using high resolution positioning data (GPS) to capture mobility patterns.
  • The survey focuses on analyses related to frequently visited locations, modes of transportation, trajectory patterns, and placed-based activities. The authors find “high regularity” in human mobility patterns despite high levels of variation among the mobility areas covered by individuals.

Phithakkitnukoon, Santi, Zbigniew Smoreda, and Patrick Olivier. Socio-Geography of Human Mobility: A Study Using Longitudinal Mobile Phone Data. PLoS ONE, 2012.

  • This study used a year’s call logs and location data of approximately one million mobile phone users in Portugal to analyze the association between individuals’ mobility and their social networks.
  • It measures and analyze travel scope (locations visited) and geo-social radius (distance from friends, family, and acquaintances) to determine the association.
  • It finds that 80% of places visited are within 20 km of an individual’s nearest social ties’ location and it rises to 90% at 45 km radius. Further, as population density increases, distance between individuals and their social networks decreases.
  • The findings in this study demonstrates how mobile phone data can provide insights to “the socio-geography of human mobility”.

Semanjski, Ivana, and Sidharta Gautama. Crowdsourcing Mobility Insights – Reflection of Attitude Based Segments on High Resolution Mobility Behaviour Data. vol. 71, Transportation Research, 2016.

  • Using cellphone data, this study maps attitudinal segments that explain how age, gender, occupation, household size, income, and car ownership influence an individual’s mobility patterns. This type of segment analysis is seen as particularly useful for targeted messaging.
  • The authors argue that these time- and space-specific insights could also provide value for government officials and policymakers, by, for example, allowing for evidence-based transportation pricing options and public sector advertising campaign placement.

Silveira, Lucas M., et al. MobHet: Predicting Human Mobility using Heterogeneous Data Sources. vol. 95, Computer Communications , 2016.

  • This study explores the potential of using data from multiple sources (e.g., Twitter and Foursquare), in addition to GPS data, to provide a more accurate prediction of human mobility. This heterogenous data captures popularity of different locations, frequency of visits to those locations, and the relationships among people who are moving around the target area. The authors’ initial experimentation finds that the combination of these sources of data are demonstrated to be more accurate in identifying human mobility patterns.

Wilson, Robin, et al. Rapid and Near Real-Time Assessments of Population Displacement Using Mobile Phone Data Following Disasters: The 2015 Nepal Earthquake. PLOS Current Disasters, 2016.

  • Utilizing call detail records of 12 million mobile phone users in Nepal, this study seeks spatio-temporal details of the population after the earthquake on April 25, 2015.
  • It seeks to answer the problem of slow and ineffective disaster response, by capturing near real-time displacement pattern provided by mobile phone call detail records, in order to inform humanitarian agencies on where to distribute their assistance. The preliminary results of this study were available nine days after the earthquake.
  • This project relies on the foundational cooperation with mobile phone operator, who supplied the de-identified data from 12 million users, before the earthquake.
  • The study finds that shortly after the earthquake there was an anomalous population movement out of the Kathmandu Valley, the most impacted area, to surrounding areas. The study estimates 390,000 people above normal had left the valley.

Data, Gender and Mobility

Althoff, Tim, et al. “Large-Scale Physical Activity Data Reveal Worldwide Activity Inequality.” Nature, 2017.

  • This study’s analysis of worldwide physical activity is built on a dataset containing 68 million days of physical activity of 717,527 people collected through their smartphone accelerometers.
  • The authors find a significant reduction in female activity levels in cities with high active inequality, where high active inequality is associated with low city walkability – walkability indicators include pedestrian facilities (city block length, intersection density, etc.) and amenities (shops, parks, etc.).
  • Further, they find that high active inequality is associated with high levels of inactivity-related health problems, like obesity.

Borker, Girija. “Safety First: Street Harassment and Women’s Educational Choices in India.” Stop Street Harassment, 2017.

  • Using data collected from SafetiPin, an application that allows user to mark an area on a map as safe or not, and Safecity, another application that lets users share their experience of harassment in public places, the researcher analyzes the safety of travel routes surrounding different colleges in India and their effect on women’s college choices.
  • The study finds that women are willing to go to a lower ranked college in order to avoid higher risk of street harassment. Women who choose the best college from their set of options, spend an average of $250 more each year to access safer modes of transportation.

Frias-Martinez, Vanessa, Enrique Frias-Martinez, and Nuria Oliver. A Gender-Centric Analysis of Calling Behavior in a Developing Economy Using Call Detail Records. Association for the Advancement of Articial Intelligence, 2010.

  • Using encrypted Call Detail Records (CDRs) of 10,000 participants in a developing economy, this study analyzes the behavioral, social, and mobility variables to determine the gender of a mobile phone user, and finds that there is a difference in behavioral and social variables in mobile phone use between female and male.
  • It finds that women have higher usage of phone in terms of number of calls made, call duration, and call expenses compared to men. Women also have bigger social network, meaning that the number of unique phone numbers that contact or get contacted is larger. It finds no statistically significant difference in terms of distance made between calls in men and women.
  • Frias-Martinez et al recommends to take these findings into consideration when designing a cellphone based service.

Psylla, Ioanna, Piotr Sapiezynski, Enys Mones, Sune Lehmann. “The role of gender in social network organization.” PLoS ONE 12, December 20, 2017.

  • Using a large dataset of high resolution data collected through mobile phones, as well as detailed questionnaires, this report studies gender differences in a large cohort. The researchers consider mobility behavior and individual personality traits among a group of more than 800 university students.
  • Analyzing mobility data, they find both that women visit more unique locations over time, and that they have more homogeneous time distribution over their visited locations than men, indicating the time commitment of women is more widely spread across places.

Vaitla, Bapu. Big Data and the Well-Being of Women and Girls: Applications on the Social Scientific Frontier. Data2X, Apr. 2017.

  • In this study, the researchers use geospatial data, credit card and cell phone information, and social media posts to identify problems–such as malnutrition, education, access to healthcare, mental health–facing women and girls in developing countries.
  • From the credit card and cell phone data in particular, the report finds that analyzing patterns of women’s spending and mobility can provide useful insight into Latin American women’s “economic lifestyles.”
  • Based on this analysis, Vaitla recommends that various untraditional big data be used to fill gaps in conventional data sources to address the common issues of invisibility of women and girls’ data in institutional databases.

Solving Public Problems with Data


Dinorah Cantú-Pedraza and Sam DeJohn at The GovLab: “….To serve the goal of more data-driven and evidence-based governing,  The GovLab at NYU Tandon School of Engineering this week launched “Solving Public Problems with Data,” a new online course developed with support from the Laura and John Arnold Foundation.

This online lecture series helps those working for the public sector, or simply in the public interest, learn to use data to improve decision-making. Through real-world examples and case studies — captured in 10 video lectures from leading experts in the field — the new course outlines the fundamental principles of data science and explores ways practitioners can develop a data analytical mindset. Lectures in the series include:

  1. Introduction to evidence-based decision-making  (Quentin Palfrey, formerly of MIT)
  2. Data analytical thinking and methods, Part I (Julia Lane, NYU)
  3. Machine learning (Gideon Mann, Bloomberg LP)
  4. Discovering and collecting data (Carter Hewgley, Johns Hopkins University)
  5. Platforms and where to store data (Arnaud Sahuguet, Cornell Tech)
  6. Data analytical thinking and methods, Part II (Daniel Goroff, Alfred P. Sloan Foundation)
  7. Barriers to building a data practice (Beth Blauer, Johns Hopkins University and GovEx)
  8. Data collaboratives (Stefaan G. Verhulst, The GovLab)
  9. Strengthening a data analytic culture (Amen Ra Mashariki, ESRI)
  10. Data governance and sharing (Beth Simone Noveck, NYU Tandon/The GovLab)

The goal of the lecture series is to enable participants to define and leverage the value of data to achieve improved outcomes and equities, reduced cost and increased efficiency in how public policies and services are created. No prior experience with computer science or statistics is necessary or assumed. In fact, the course is designed precisely to serve public professionals seeking an introduction to data science….(More)”.

Understanding Corporate Data Sharing Decisions: Practices, Challenges, and Opportunities for Sharing Corporate Data with Researchers


Leslie Harris at the Future of Privacy Forum: “Data has become the currency of the modern economy. A recent study projects the global volume of data to grow from about 0.8 zettabytes (ZB) in 2009 to more than 35 ZB in 2020, most of it generated within the last two years and held by the corporate sector.

As the cost of data collection and storage becomes cheaper and computing power increases, so does the value of data to the corporate bottom line. Powerful data science techniques, including machine learning and deep learning, make it possible to search, extract and analyze enormous sets of data from many sources in order to uncover novel insights and engage in predictive analysis. Breakthrough computational techniques allow complex analysis of encrypted data, making it possible for researchers to protect individual privacy, while extracting valuable insights.

At the same time, these newfound data sources hold significant promise for advancing scholarship and shaping more impactful social policies, supporting evidence-based policymaking and more robust government statistics, and shaping more impactful social interventions. But because most of this data is held by the private sector, it is rarely available for these purposes, posing what many have argued is a serious impediment to scientific progress.

A variety of reasons have been posited for the reluctance of the corporate sector to share data for academic research. Some have suggested that the private sector doesn’t realize the value of their data for broader social and scientific advancement. Others suggest that companies have no “chief mission” or public obligation to share. But most observers describe the challenge as complex and multifaceted. Companies face a variety of commercial, legal, ethical, and reputational risks that serve as disincentives to sharing data for academic research, with privacy – particularly the risk of reidentification – an intractable concern. For companies, striking the right balance between the commercial and societal value of their data, the privacy interests of their customers, and the interests of academics presents a formidable dilemma.

To be sure, there is evidence that some companies are beginning to share for academic research. For example, a number of pharmaceutical companies are now sharing clinical trial data with researchers, and a number of individual companies have taken steps to make data available as well. What is more, companies are also increasingly providing open or shared data for other important “public good” activities, including international development, humanitarian assistance and better public decision-making. Some are contributing to data collaboratives that pool data from different sources to address societal concerns. Yet, it is still not clear whether and to what extent this “new era of data openness” will accelerate data sharing for academic research.

Today, the Future of Privacy Forum released a new study, Understanding Corporate Data Sharing Decisions: Practices, Challenges, and Opportunities for Sharing Corporate Data with ResearchersIn this report, we aim to contribute to the literature by seeking the “ground truth” from the corporate sector about the challenges they encounter when they consider making data available for academic research. We hope that the impressions and insights gained from this first look at the issue will help formulate further research questions, inform the dialogue between key stakeholders, and identify constructive next steps and areas for further action and investment….(More)”.