How medical AI devices are evaluated: limitations and recommendations from an analysis of FDA approvals


Paper by Eric Wu et al: “Medical artificial-intelligence (AI) algorithms are being increasingly proposed for the assessment and care of patients. Although the academic community has started to develop reporting guidelines for AI clinical trials, there are no established best practices for evaluating commercially available algorithms to ensure their reliability and safety. The path to safe and robust clinical AI requires that important regulatory questions be addressed. Are medical devices able to demonstrate performance that can be generalized to the entire intended population? Are commonly faced shortcomings of AI (overfitting to training data, vulnerability to data shifts, and bias against underrepresented patient subgroups) adequately quantified and addressed?

In the USA, the US Food and Drug Administration (FDA) is responsible for approving commercially marketed medical AI devices. The FDA releases publicly available information on approved devices in the form of a summary document that generally contains information about the device description, indications for use, and performance data of the device’s evaluation study. The FDA has recently called for improvement of test-data quality, improvement of trust and transparency with users, monitoring of algorithmic performance and bias on the intended population, and testing with clinicians in the loop. To understand the extent to which these concerns are addressed in practice, we have created an annotated database of FDA-approved medical AI devices and systematically analyzed how these devices were evaluated before approval. Additionally, we have conducted a case study of pneumothorax-triage devices and found that evaluating deep-learning models at a single site alone, which is often done, can mask weaknesses in the models and lead to worse performance across sites.

Fig. 1: Breakdown of 130 FDA-approved medical AI devices by body area.

figure1

Devices are categorized by risk level (square, high risk; circle, low risk). Blue indicates that a multi-site evaluation was reported; otherwise, symbols are gray. Red outline indicates a prospective study (key, right margin). Numbers in key indicate the number of devices with each characteristic….(More)”.

The Use of Mobility Data for Responding to the COVID-19 Pandemic


New Report, Repository and set of Case Studies commissioned by the Open Data Institute: “…The GovLab and Cuebiq firstly assembled a repository of mobility data collaboratives related to Covid-19. They then selected five of these to analyse further, and produced case studies on each of the collaboratives (which you can find below in the ‘Key outputs’ section).

After analysing these initiatives, Cuebiq and The GovLab then developed a synthesis report, which contains sections focused on:

  • Mobility data – what it is and how it can be used
  • Current practice – insights from five case studies
  • Prescriptive analysis – recommendations for the future

Findings and recommendations

Based on this analysis, the authors of the report recommend nine actions which have the potential to enable more effective, sustainable and responsible re-use of mobility data through data collaboration to support decision making regarding pandemic prevention, monitoring, and response:

  1. Developing and clarifying governance framework to enable the trusted, transparent, and accountable reuse of privately held data in the public interest under a clear regulatory framework
  2. Building capacity of organisations in the public and private sector to reuse and act on data through investments in training, education, and reskilling of relevant authorities; especially driving support for institutions in the Global South
  3. Establishing data stewards in organisations who can coordinate and collaborate with counterparts on using data in the public’s interest and acting on it.
  4. Establishing dedicated and sustainable CSR (Corporate Social Responsibility) programs on data in organisations to coordinate and collaborate with counterparts on using and acting upon data in the public’s interest.
  5. Building a network of data stewards to coordinate and streamline efforts while promoting greater transparency; as well as exchange best practices and lessons learned.
  6. Engaging citizens about how their data is being used so clearly articulate how they want their data to be responsibly used, shared, and protected.
  7. Promoting technological innovation through collaboration between funders (eg governments and foundations) and researchers (eg data scientists) to develop and deploy useful, privacy-preserving technologies.
  8. Unlocking funds from a variety of sources to ensure projects are sustainable and can operate long term.
  9. Increase research and spur evidence gathering by publishing easily accessible research and creating dedicated centres to develop best practices.

This research begins to demonstrate the value that a handful of new data-sharing initiatives have had in the ongoing response to Covid-19. The pandemic isn’t yet over, and we will need to continue to assess and evaluate how data has been shared – both successfully and unsuccessfully – and who has benefited or been harmed in the process. More research is needed to highlight the lessons from this emergency that can be applied to future crises….(More)”.

A Victory for Scientific Pragmatism


Essay by Arturo CasadevallMichael J. Joynerand Nigel Paneth:”…The convalescent plasma controversy highlights the need to better educate physicians on the knowledge problem in medicine: How do we know what we know, and how do we acquire new knowledge? The usual practice guidelines doctors rely on for the treatment of disease were not available for the treatment of Covid-19 early in the pandemic, since these are usually issued by professional societies only after definitive information is available from RCTs, a luxury we did not have. The convalescent plasma experience supports Devorah Goldman’s plea to consider all available information when making therapeutic decisions.

Fortunately, the availability of rapid communication through pre-print studies, social media, and online conferences have allowed physicians to learn quickly. The experience suggests the value of providing more instruction in medical schools, postgraduate education, and continuing medical education on how best to evaluate evidence — especially preliminary and seemingly contradictory evidence. Just as physicians learn to use clinical judgment in treating individual patients, they must learn how to weigh evidence in treating populations of patients. We also need greater nimbleness and more flexibility from regulators and practice-guideline groups in emergency situations such as pandemics. They should issue interim recommendations that synthesize the best available evidence, as the American Association of Blood Bankers has done for plasma, recognizing that these recommendations may change as new evidence accumulates. Similarly, we all need to make greater efforts to educate the public to understand that all knowledge in medicine and science is provisional, subject to change as new and better studies emerge. Updating and revising recommendations as knowledge advances is not a weakness but a foundational strength of good medicine….(More)”.

Open data in action: initiatives during the initial stage of the COVID-19 pandemic


Report by OECD and The GovLab: “The COVID-19 pandemic has increased the demand for access to timely, relevant, and quality data. This demand has been driven by several needs: taking informed policy actions quickly, improving communication on the current state of play, carrying out scientific analysis of a dynamic threat, understanding its social and economic impact, and enabling civil society oversight and reporting.


This report…assesses how open government data (OGD) was used to react and respond to the COVID-19 pandemic during initial stage of the crisis (March-July 2020) based on initiatives collected through an open call for evidence. It also seeks to transform lessons learned into considerations for policy makers on how to improve OGD policies to better prepare for future shocks…(More)”.

Hospitals Hide Pricing Data From Search Results


Tom McGintyAnna Wilde Mathews and Melanie Evans at the Wall Street Journal: “Hospitals that have published their previously confidential prices to comply with a new federal rule have also blocked that information from web searches with special coding embedded on their websites, according to a Wall Street Journal examination.

The information must be disclosed under a federal rule aimed at making the $1 trillion sector more consumer friendly. But hundreds of hospitals embedded code in their websites that prevented Alphabet Inc.’s Google and other search engines from displaying pages with the price lists, according to the Journal examination of more than 3,100 sites.

The code keeps pages from appearing in searches, such as those related to a hospital’s name and prices, computer-science experts said. The prices are often accessible other ways, such as through links that can require clicking through multiple layers of pages.

“It’s technically there, but good luck finding it,” said Chirag Shah, an associate professor at the University of Washington who studies human interactions with computers. “It’s one thing not to optimize your site for searchability, it’s another thing to tag it so it can’t be searched. It’s a clear indication of intentionality.”…(More)”.

An early warning approach to monitor COVID-19 activity with multiple digital traces in near real time


Paper by Nicole E. Kogan et al: “We propose that several digital data sources may provide earlier indication of epidemic spread than traditional COVID-19 metrics such as confirmed cases or deaths. Six such sources are examined here: (i) Google Trends patterns for a suite of COVID-19–related terms; (ii) COVID-19–related Twitter activity; (iii) COVID-19–related clinician searches from UpToDate; (iv) predictions by the global epidemic and mobility model (GLEAM), a state-of-the-art metapopulation mechanistic model; (v) anonymized and aggregated human mobility data from smartphones; and (vi) Kinsa smart thermometer measurements.

We first evaluate each of these “proxies” of COVID-19 activity for their lead or lag relative to traditional measures of COVID-19 activity: confirmed cases, deaths attributed, and ILI. We then propose the use of a metric combining these data sources into a multiproxy estimate of the probability of an impending COVID-19 outbreak. Last, we develop probabilistic estimates of when such a COVID-19 outbreak will occur on the basis of multiproxy variability. These outbreak-timing predictions are made for two separate time periods: the first, a “training” period, from 1 March to 31 May 2020, and the second, a “validation” period, from 1 June to 30 September 2020. Consistent predictive behavior among proxies in both of these subsequent and nonoverlapping time periods would increase the confidence that they may capture future changes in the trajectory of COVID-19 activity….(More)”.

The Ethics and Laws of Medical Big Data


Chapter by Hrefna Gunnarsdottir et al: “The COVID-19 pandemic has highlighted that leveraging medical big data can help to better predict and control outbreaks from the outset. However, there are still challenges to overcome in the 21st century to efficiently use medical big data, promote innovation and public health activities and, at the same time, adequately protect individuals’ privacy. The metaphor that property is a “bundle of sticks”, each representing a different right, applies equally to medical big data. Understanding medical big data in this way raises a number of questions, including: Who has the right to make money off its buying and selling, or is it inalienable? When does medical big data become sufficiently stripped of identifiers that the rights of an individual concerning the data disappear? How have different regimes such as the General Data Protection Regulation in Europe and the Health Insurance Portability and Accountability Act in the US answered these questions differently? In this chapter, we will discuss three topics: (1) privacy and data sharing, (2) informed consent, and (3) ownership. We will identify and examine ethical and legal challenges and make suggestions on how to address them. In our discussion of each of the topics, we will also give examples related to the use of medical big data during the COVID-19 pandemic, though the issues we raise extend far beyond it….(More)”.

Policy 2.0 in the Pandemic World: What Worked, What Didn’t, and Why


Blog by David Osimo: “…So how, then, did these new tools perform when confronted with the once-in-a-lifetime crisis of a vast global pandemic?

It turns out, some things worked. Others didn’t. And the question of how these new policymaking tools functioned in the heat of battle is already generating valuable ammunition for future crises.

So what worked?

Policy modelling – an analytical framework designed to anticipate the impact of decisions by simulating the interaction of multiple agents in a system rather than just the independent actions of atomised and rational humans – took centre stage in the pandemic and emerged with reinforced importance in policymaking. Notably, it helped governments predict how and when to introduce lockdowns or open up. But even there uptake was limited. A recent survey showed that of the 28 models used in different countries to fight the pandemic were traditional, and not the modern “agent-based models” or “system dynamics” supposed to deal best with uncertainty. Meanwhile, the concepts of system science was becoming prominent and widely communicated. It became quickly clear in the course of the crisis that social distancing was more a method to reduce the systemic pressure on the health services than a way to avoid individual contagion (the so called “flatten the curve” project).

Open government data has long promised to allow citizens and businesses to build new services at scale and make government accountable. The pandemic largely confirmed how important this data could be to allow citizens to analyse things independently. Hundreds of analysts from all walks of life and disciplines used social media to discuss their analysis and predictions, many becoming household names and go-to people in countries and regions. Yes, this led to noise and a so-called “infodemic,” but overall it served as a fundamental tool to increase confidence and consensus behind the policy measures and to make governments accountable for their actions. For instance, one Catalan analyst demonstrated that vaccines were not provided during weekends and forced the government to change its stance. Yet it is also clear that not all went well, most notably on the supply side. Governments published data of low quality, either in PDF, with delays or with missing data due to spreadsheet abuse.

In most cases, there was little demand for sophisticated data publishing solutions such as “linked” or “FAIR” data, although particularly significant was the uptake of these kinds of solutions when it came time to share crucial research data. Experts argue that the trend towards open science has accelerated dramatically and irreversibly in the last year, as shown by the portal https://www.covid19dataportal.org/ which allowed sharing of high quality data for scientific research….

But other new policy tools proved less easy to use and ultimately ineffective. Collaborative governance, for one, promised to leverage the knowledge of thousands of citizens to improve public policies and services. In practice, methodologies aiming at involving citizens in decision making and service design were of little use. Decisions related to lockdown and opening up were taken in closed committees in top down mode. Individual exceptions certainly exist: Milan, one of the cities worst hit by the pandemic, launched a co-created strategy for opening up after the lockdown, receiving almost 3000 contributions to the consultation. But overall, such initiatives had limited impact and visibility. With regard to co-design of public services, in times of emergency there was no time for prototyping or focus groups. Services such as emergency financial relief had to be launched in a hurry and “just work.”

Citizen science promised to make every citizen a consensual data source for monitoring complex phenomena in real time through apps and Internet-of-Things sensors. In the pandemic, there were initially great expectations on digital contact tracing apps to allow for real time monitoring of contagions, most notably through bluetooth connections in the phone. However, they were mostly a disappointment. Citizens were reluctant to install them. And contact tracing soon appeared to be much more complicated – and human intensive – than originally thought. The huge debate between technology and privacy was followed by very limited impact. Much ado about nothing.

Behavioural economics (commonly known as nudge theory) is probably the most visible failure of the pandemic. It promised to move beyond traditional carrots (public funding) and sticks (regulation) in delivering policy objectives by adopting an experimental method to influence or “nudge” human behaviour towards desired outcomes. The reality is that soft nudges proved an ineffective alternative to hard lockdown choices. What makes it uniquely negative is that such methods took centre stage in the initial phase of the pandemic and particularly informed the United Kingdom’s lax approach in the first months on the basis of a hypothetical and unproven “behavioural fatigue.” This attracted heavy criticism towards the excessive reliance on nudges by the United Kingdom government, a legacy of Prime Minister David Cameron’s administration. The origin of such criticisms seems to lie not in the method shortcomings per se, which enjoyed success previously on more specific cases, but in the backlash from excessive expectations and promises, epitomised in the quote of a prominent behavioural economist: “It’s no longer a matter of supposition as it was in 2010 […] we can now say with a high degree of confidence these models give you best policy.

Three factors emerge as the key determinants behind success and failure: maturity, institutions and leadership….(More)”.

Machine Learning Shows Social Media Greatly Affects COVID-19 Beliefs


Jessica Kent at HealthITAnalytics: “Using machine learning, researchers found that people’s biases about COVID-19 and its treatments are exacerbated when they read tweets from other users, a study published in JMIR showed.

The analysis also revealed that scientific events, like scientific publications, and non-scientific events, like speeches from politicians, equally influence health belief trends on social media.

The rapid spread of COVID-19 has resulted in an explosion of accurate and inaccurate information related to the pandemic – mainly across social media platforms, researchers noted.

“In the pandemic, social media has contributed to much of the information and misinformation and bias of the public’s attitude toward the disease, treatment and policy,” said corresponding study author Yuan Luo, chief Artificial Intelligence officer at the Institute for Augmented Intelligence in Medicine at Northwestern University Feinberg School of Medicine.

“Our study helps people to realize and re-think the personal decisions that they make when facing the pandemic. The study sends an ‘alert’ to the audience that the information they encounter daily might be right or wrong, and guide them to pick the information endorsed by solid scientific evidence. We also wanted to provide useful insight for scientists or healthcare providers, so that they can more effectively broadcast their voice to targeted audiences.”…(More)”.

Selected Readings on Data, Gender, and Mobility


By Michelle Winowatan, Uma Kalkar, Andrew Young, and Stefaan Verhulst

The Living Library’s Selected Readings series seeks to build a knowledge base on innovative approaches for improving the effectiveness and legitimacy of governance. This curated and annotated collection of recommended works on the topic of data, gender, and mobility was originally published in 2017, and updated in 2021.

This edition of the Selected Readings was  developed as part of an ongoing project at the GovLab, supported by Data2X, in collaboration with UNICEF, DigitalGlobe, IDS (UDD/Telefonica R&D), and the ISI Foundation, to establish a data collaborative to analyze unequal access to urban transportation for women and girls in Chile. We thank all our partners for their suggestions to the below curation – in particular Leo Ferres at IDS who got us started with this collection; Ciro Cattuto and Michele Tizzoni from the ISI Foundation; and Bapu Vaitla at Data2X for their pointers to the growing data and mobility literature. 

Introduction

Daily mobility is key for gender equity. Access to transportation contributes to women’s agency and independence. The ability to move from place to place safely and efficiently can allow women to access education, work, and the public domain more generally. Yet, mobility is not just a means to access various opportunities. It is also a means to enter the public domain.

Women’s mobility is a multi-layered challenge

Women’s daily mobility, however, is often hampered by social, cultural, infrastructural, and technical barriers. Cultural bias, for instance, limits women’s mobility in a way that women are confined to an area with close proximity to their house due to society’s double standard on women to be homemakers. From an infrastructural perspective, public transportation mostly only accommodates home-to-work trips, when in reality women often make more complex trips with multiple stops, for example, at the market, school, healthcare provider – sometimes called “trip chaining.” From a safety perspective, women tend to avoid making trips in certain areas and/or at certain times due to a constant risk of being sexually harassed n public places. Women are also pushed toward more expensive transportation – such as taking a cab instead of a bus or train – based on safety concerns.

The growing importance of (new sources of) data

Researchers are increasingly experimenting with ways to address these interdependent problems through the analysis of diverse datasets, often collected by private sector businesses and other non-governmental entities. Gender-disaggregated mobile phone records, geospatial data, satellite imagery, and social media data, to name a few, are providing evidence-based insight into gender and mobility concerns. Such data collaboratives – the exchange of data across sectors to create public value – can help governments, international organizations, and other public sector entities in the move toward more inclusive urban and transportation planning, and the promotion of gender equity.

The below curated set of readings seek to focus on the following areas:

  1. Insights on how data can inform gender empowerment initiatives,
  2. Emergent research into the capacity of new data sources – like call detail records (CDRs) and satellite imagery – to increase our understanding of human mobility patterns, and,
  3. Publications exploring data-driven policy for gender equity in mobility.

Readings are listed in alphabetical order.

We selected the readings based upon their focus (gender and/or mobility related); scope and representativeness (going beyond one project or context); type of data used (such as CDRs and satellite imagery); and date of publication.

Annotated Reading List

Data and Gender

Blumenstock, Joshua, and Nathan Eagle. Mobile Divides: Gender, Socioeconomic Status, and Mobile Phone Use in Rwanda. ACM Press, 2010.

  • Using traditional survey and mobile phone operator data, this study analyzes gender and socioeconomic divides in mobile phone use in Rwanda, where it is found that the use of mobile phones is significantly more prevalent in men and the higher class.
  • The study also shows the differences in the way men and women use phones, for example: women are more likely to use a shared phone than men.
  • The authors frame their findings around gender and economic inequality in the country to the end of providing pointers for government action.

Bosco, Claudio, et al. Mapping Indicators of Female Welfare at High Spatial Resolution. WorldPop and Flowminder, 2015.

  • This report focuses on early adolescence in girls, which often comes with higher risk of violence, fewer economic opportunity, and restrictions on mobility. Significant data gaps, methodological and ethical issues surrounding data collection for girls also create barriers for policymakers to create evidence-based policy to address those issues.
  • The authors analyze geolocated household survey data, using statistical models and validation techniques, and creates high-resolution maps of various sex-disaggregated indicators, such as nutrition level, access to contraception, and literacy, to better inform local policy making processes.
  • Further, it identifies the gender data gap and issues surrounding gender data collection, and provides arguments for why having  comprehensive data can help create better policy and contribute to the achievements of the Sustainable Development Goals (SDGs).

Buvinic, Mayra, Rebecca Furst-Nichols, and Gayatri Koolwal. Mapping Gender Data Gaps. Data2X, 2014.

  • This study identifies gaps in gender data in developing countries on health, education, economic opportunities, political participation, and human security issues.
  • It recommends ways to close the gender data gap through censuses and micro-level surveys, service and administrative records, and emphasizes how “big data” in particular can fill the missing data that will be able to measure the progress of women and girls well being. The authors argue that identifying these gaps is key to achieving SDG 5: advancing gender equality and women’s empowerment.

Catalyzing Inclusive Financial Systems: Chile’s Commitment to Women’s Data. Data2X, 2014.

  • This article analyzes global and national data in the banking sector to fill the gap of sex-disaggregated data in Chile. The purpose of the study is to describe the difference in spending behavior and priorities between women and men, identify the challenges for women in accessing financial services, and create policies that promote women inclusion in Chile.

Ready to Measure: Twenty Indicators for Monitoring SDG Gender Targets. Open Data Watch and Data2X, 2016.

  • Using readily available data, this study identifies 20 SDG indicators related to gender issues that can serve as a baseline measurement for advancing gender equality, such as percentage of women aged 20-24 who were married or in a union before age 18 (child marriage), proportion of seats held by women in national parliament, and share of women among mobile telephone owners, among others.

Ready to Measure Phase II: Indicators Available to Monitor SDG Gender Targets. Open Data Watch and Data2X, 2017.

  • The Phase II paper is an extension of the Ready to Measure Phase I above. Where Phase I identifies the readily available data to measure women and girls well-being, Phase II provides information on how to access this data and summarizes insights extracted from it.
  • Phase II elaborates the insights about data gathered from ready to measure indicators and finds that although underlying data to measure indicators of women and girls’ wellbeing is readily available in most cases, it is typically not sex-disaggregated.
  • Over one in five – 53 out of 232 – SDG indicators specifically refer to women and girls. However, further analysis from this study reveals that at least 34 more indicators should be disaggregated by sex. For instance, there should be 15 more sex-disaggregated indicators for SDG number 3: “Ensure healthy lives and promote well-being for all at all ages.”
  • The report recommends national statistical agencies to take the lead and assert additional effort to fill the data gap by utilizing tools such as the statistical model to fill the current gender data gap for each of the SDGs.

Reed, Philip J., Muhammad Raza Khan, and Joshua Blumenstock. Observing gender dynamics and disparities with mobile phone metadata. International Conference on Information and Communication Technologies and Development (ICTD), 2016.

  • The study analyzes mobile phone logs of millions of Pakistani residents to explore whether there is a difference in mobile phone usage behavior between male and female and determine the extent to which gender inequality is reflected in mobile phone usage.
  • It utilizes mobile phone data to analyze the pattern of usage behavior between genders, and socioeconomic and demographic data obtained from census and advocacy groups to assess the state of gender equality in each region in Pakistan.
  • One of its findings is a strong positive correlation between the proportion of female mobile phone users and education score.

Stehlé, Juliette, et al. Gender homophily from spatial behavior in a primary school: A sociometric study. 2013.

  • This paper seeks to understand homophily, a human behavior that characterizes interactions with peers who have similarities in “physical attributes to tastes or political opinions”. Further, it seeks to identify the magnitude of influence, a type of homophily applied to social structures.
  • Focusing on gender interaction among primary school aged children in France, this paper collects data from wearable devices from 200 children in the period of 2 days and measures the physical proximity and duration of the interaction among those children in the playground.
  • It finds that interaction patterns are significantly determined by grade and class structure of the school. This means that children belonging to the same class have most interactions, and that lower grades usually do not interact with higher grades.
  • From a gender lens, this study finds that mixed-gender interaction lasts shorter relative to same-gender interaction. In addition, interaction among girls is also longer compared to interaction among boys. These indicate that the children in this school tend to have stronger relationships within their own gender, or what the study calls gender homophily. It further finds that gender homophily is apparent in all classes.

Strengthening Gender Measures and Data in the COVID-19 Era: An Urgent Need for Change. Paris 21, 2021.

  • COVID-19 has exacerbated gender disparities, especially with regard to women’s livelihoods, unpaid labor, mental health, and risk of gender-based violence. Gaps in gender data impedes robust, data-driven, and effective policies to quantify, analyse, and respond to these issues. 
  • Without this information, the full effects of the COVID-19 pandemic cannot be understood. This report calls on National Statistical Systems, survey managers, funders, multilateral agencies, researchers, and policymakers to collect gender-intentional and disaggregated data that is standardized and comparable to address key areas of concern for women and girls. Additionally, it seeks to link non-traditional data sources, such as social media and news media, with existing frameworks to fill in knowledge gaps. Moreover, this information must be rendered accessible for all stakeholders to maximize the potential of the information. Post-pandemic, conscious collection and collation of gendered data is vital to preempt policy problems.

The Sex, Gender and COVID-19 Project: The COVID-19 Sex-Disaggregated Data Tracker. 2021.

  • This data tracker, produced by Global Health 50/50, the African Population and Health Research Center, and the International Center for Research on Women, tracks which countries and datasets have reported sex-disaggregated data on COVID-19 testing, confirmed cases, hospitalizations, and deaths.

Data and Mobility

Bengtsson, Linus, et al. Using Mobile Phone Data to Predict the Spatial Spread of Cholera. Flowminder, 2015.

  • This study seeks to predict the 2010 cholera epidemic in Haiti using 2.9 million anonymous mobile phone SIM cards and reported cases of Cholera from the Haitian Directorate of Health, where 78 study areas were analyzed in the period of October 16 – December 16, 2010.
  • From this dataset, the study creates a mobility matrix that indicates mobile phone movement from one study area to another and combines that with the number of reported cases of cholera in the study areas to calculate the infectious pressure level of those areas.
  • The main finding of its analysis shows that the outbreak risk of a study area correlates positively with the infectious pressure level, where an infectious pressure of over 22 results in an outbreak within 7 days. Further, it finds that the infectious pressure level can inform the sensitivity and specificity of the outbreak prediction.
  • It hopes to improve infectious disease containment by identifying areas with highest risks of outbreaks.

Calabrese, Francesco, et al. Understanding Individual Mobility Patterns from Urban Sensing Data: A Mobile Phone Trace Example. SENSEable City Lab, MIT, 2012.

  • This study compares mobile phone data and odometer readings from annual safety inspections to characterize individual mobility and vehicular mobility in the Boston Metropolitan Area, measured by the average daily total trip length of mobile phone users and average daily Vehicular Kilometers Traveled (VKT).
  • The study found that, “accessibility to work and non-work destinations are the two most important factors in explaining the regional variations in individual and vehicular mobility, while the impacts of populations density and land use mix on both mobility measures are insignificant.” Further, “a well-connected street network is negatively associated with daily vehicular total trip length.”
  • This study demonstrates the potential for mobile phone data to provide useful and updatable information on individual mobility patterns to inform transportation and mobility research.

Campos-Cordobés, Sergio, et al. Chapter 5 – Big Data in Road Transport and Mobility Research.” Intelligent Vehicles. Edited by Felipe Jiménez. Butterworth-Heinemann, 2018.

  • This study outlines a number of techniques and data sources – such as geolocation information, mobile phone data, and social network observation – that could be leveraged to predict human mobility.
  • The authors also provide a number of examples of real-world applications of big data to address transportation and mobility problems, such as transport demand modeling, short-term traffic prediction, and route planning.

Gauvin, Laetitia et al. Gender gaps in urban mobility. Humanities and Information Science. Humanities & Social Sciences Communications vol. 7, issue 11, 2020.

  • This article discusses how urbanization affects mobility of women in realizing their rights. It points out the historic lack of gender disaggregated data for urban planning, leading to transportation designs that do not best accommodate the needs of women.
  • Examining the case study of urban mobility through a gendered lens in the large and growing metropolitan area of Santiago, Chile, the article examines the mobility traces from Call Detail Records (CDRs) of an anonymized cohort of mobile phone users, sorted by gender, over 3 months. It then mapped differences between men and women with regard to socio-demographic indicators and mobility differences across the city and through the Santiago transportation network structure and identified points of interests frequented by either sex to inform gendered mobility needs in urban areas.

Lin, Miao, and Wen-Jing Hsu. Mining GPS Data for Mobility Patterns: A Survey. Pervasive and Mobile Computing vol. 12, 2014.

  • This study surveys the current field of research using high resolution positioning data (GPS) to capture mobility patterns.
  • The survey focuses on analyses related to frequently visited locations, modes of transportation, trajectory patterns, and placed-based activities. The authors find “high regularity” in human mobility patterns despite high levels of variation among the mobility areas covered by individuals.

Phithakkitnukoon, Santi, Zbigniew Smoreda, and Patrick Olivier. Socio-Geography of Human Mobility: A Study Using Longitudinal Mobile Phone Data. PLoS ONE, 2012.

  • This study used a year’s call logs and location data of approximately one million mobile phone users in Portugal to analyze the association between individuals’ mobility and their social networks.
  • It measures and analyze travel scope (locations visited) and geo-social radius (distance from friends, family, and acquaintances) to determine the association.
  • It finds that 80% of places visited are within 20 km of an individual’s nearest social ties’ location and it rises to 90% at 45 km radius. Further, as population density increases, distance between individuals and their social networks decreases.
  • The findings in this study demonstrates how mobile phone data can provide insights to “the socio-geography of human mobility”.

Semanjski, Ivana, and Sidharta Gautama. Crowdsourcing Mobility Insights – Reflection of Attitude Based Segments on High Resolution Mobility Behaviour Data. vol. 71, Transportation Research, 2016.

  • Using cellphone data, this study maps attitudinal segments that explain how age, gender, occupation, household size, income, and car ownership influence an individual’s mobility patterns. This type of segment analysis is seen as particularly useful for targeted messaging.
  • The authors argue that these time- and space-specific insights could also provide value for government officials and policymakers, by, for example, allowing for evidence-based transportation pricing options and public sector advertising campaign placement.

Silveira, Lucas M., et al. MobHet: Predicting Human Mobility using Heterogeneous Data Sources. vol. 95, Computer Communications , 2016.

  • This study explores the potential of using data from multiple sources (e.g., Twitter and Foursquare), in addition to GPS data, to provide a more accurate prediction of human mobility. This heterogenous data captures popularity of different locations, frequency of visits to those locations, and the relationships among people who are moving around the target area. The authors’ initial experimentation finds that the combination of these sources of data are demonstrated to be more accurate in identifying human mobility patterns.

Wilson, Robin, et al. Rapid and Near Real-Time Assessments of Population Displacement Using Mobile Phone Data Following Disasters: The 2015 Nepal Earthquake. PLOS Current Disasters, 2016.

  • Utilizing call detail records of 12 million mobile phone users in Nepal, this study seeks spatio-temporal details of the population after the earthquake on April 25, 2015.
  • It seeks to answer the problem of slow and ineffective disaster response, by capturing near real-time displacement patterns provided by mobile phone call detail records, in order to inform humanitarian agencies on where to distribute their assistance. The preliminary results of this study were available nine days after the earthquake.
  • This project relies on the foundational cooperation with mobile phone operators, who supplied the de-identified data from 12 million users before the earthquake.
  • The study finds that shortly after the earthquake there was an anomalous population movement out of the Kathmandu Valley, the most impacted area, to surrounding areas. The study estimates 390,000 more people  than normal had left the valley.

Data, Gender and Mobility

Althoff, Tim, et al.Large-Scale Physical Activity Data Reveal Worldwide Activity Inequality. Nature, 2017.

  • This study’s analysis of worldwide physical activity is built on a dataset containing 68 million days of physical activity of 717,527 people collected through their smartphone accelerometers.
  • The authors find a significant reduction in female activity levels in cities with high active inequality, where high active inequality is associated with low city walkability – walkability indicators include pedestrian facilities (city block length, intersection density, etc.) and amenities (shops, parks, etc.).
  • Further, they find that high active inequality is associated with high levels of inactivity-related health problems, like obesity.

Borker, Girija. Safety First: Street Harassment and Women’s Educational Choices in India.Stop Street Harassment, 2017.

  • Using data collected from SafetiPin, an application that allows users to mark an area on a map as safe or not, and Safecity, another application that lets users share their experience of harassment in public places, Borker analyzes the safety of travel routes surrounding different colleges in India and their effect on women’s college choices.
  • The study finds that women are willing to go to a lower ranked college in order to avoid higher risk of street harassment. Women who choose the best college from their set of options, spend an average of $250 more each year to access safer modes of transportation.

Frias-Martinez, Vanessa, Enrique Frias-Martinez, and Nuria Oliver. A Gender-Centric Analysis of Calling Behavior in a Developing Economy Using Call Detail Records. Association for the Advancement of Artificial Intelligence, 2010.

  • Using encrypted Call Detail Records (CDRs) of 10,000 participants in a developing economy, this study analyzes the behavioral, social, and mobility variables to determine the gender of a mobile phone user, and finds that there is a difference in behavioral and social variables in mobile phone use between female and male.
  • It finds that women have higher usage of phone in terms of number of calls made, call duration, and call expenses compared to men. Women also have bigger social network, meaning that the number of unique phone numbers that contact or get contacted is larger. It finds no statistically significant difference in terms of distance made between calls in men and women.
  • Frias-Martinez et al recommends to take these findings into consideration when designing a cellphone based service.

Psylla, Ioanna, Piotr Sapiezynski, Enys Mones, Sune Lehmann. The role of gender in social network organization. PLoS ONE 12, December 20, 2017.

  • Using a large dataset of high resolution data collected through mobile phones, as well as detailed questionnaires, this report studies gender differences in a large cohort. The researchers consider mobility behavior and individual personality traits among a group of more than 800 university students.
  • Analyzing mobility data, they find both that women visit more unique locations over time, and that they have more homogeneous time distribution over their visited locations than men, indicating the time commitment of women is more widely spread across places.

The Landscape of Big Data and Gender. Data2X, February, 2021.

  • Under the backdrop of COVID-19, this report reaffirms that big data initiatives to study mobility, health, and social norms through gendered lenses have greatly progressed. More private companies and think tanks have launched data collection and sharing efforts to spur innovative projects to address COVID-19 complications.
  • However, economic opportunity, security, and civic action have been lagging behind. Big data collection among these topics is complicated by the lack of sex-disaggregated datasets, gender disparities in technology access, and the lack of gender-tags among big data.
  • Large technology firms, especially social networks like Facebook, LinkedIn, Uber, and more, create a large amount of gender-organized data. The report found that users and data-holding companies are willing to share this information for public policy reasons so long as it provides value and is protected. To this end, Data2X, alongside its partners, champion the use of data collaboratives to use gender sorted information for social good.

Vaitla, Bapu. Big Data and the Well Being of Women and Girls: Applications on the Social Scientific Frontier. Data2X, Apr. 2017.

  • In this study, the researchers use geospatial data, credit card and cell phone information, and social media posts to identify problems–such as malnutrition, education, access to healthcare, mental health–facing women and girls in developing countries.
  • From the credit card and cell phone data in particular, the report finds that analyzing patterns of women’s spending and mobility can provide useful insight into Latin American women’s “economic lifestyles.”
  • Based on this analysis, Vaitla recommends that various untraditional big data be used to fill gaps in conventional data sources to address the common issues of invisibility of women and girls’ data in institutional databases.