Analyzing Big Data on a Shoestring Budget

Article by Toshiko Kaneda and Lori S. Ashford: “Big data has opened a new world for demographers and public health scientists to explore, to gain insights into social and health phenomena using the myriad digital traces we leave behind in our daily lives. But is analyzing big data practical and affordable? Researchers and organizations who have not made the leap might wonder: Do we need a lot more funding? Supercomputers? Armies of data scientists?

Three studies, presented recently in a PRB Demography Talk, show the feasibility of conducting research on a proverbial shoestring—using big data that are publicly, freely available to anyone with a personal computer and Wi-Fi connection.

Study 1: Can Google data help measure health care access more accurately?

The first study, presented by Luis Gabriel Cuervo of the Universitat Autònoma de Barcelona and the AMORE project, used Google mobility data to assess the effect of traffic congestion on people’s ability to access health services in Cali, Colombia, a city of 2.3 million. The study aimed to improve how health care accessibility is measured and communicated, to inform urban and health services planning.

Cuervo assembled a multidisciplinary research team, including mobility experts, to examine travel times from where people live to urgent and frequently used health services. The team used Google’s Distance Matrix API, which provides travel times and distance between origins and destinations, accounting for changing traffic conditions. The data are generated from Google Maps on people’s cell phones.

Combining this information with census and health services data, the study measured travel times repeatedly and revealed significant inequality by sociodemographic characteristics. On typical days, 60% of the city’s population lived more than 15 minutes by car from emergency care, with those in the poorest neighborhoods facing the longest travel times and a greater impact from traffic congestion.

Studies 2 and 3: Can Google data help predict changes in birth rates and examine excess deaths from COVID-19 related shutdowns?

In another study, Joshua Wilde from the Max Planck Institute for Demographic Research (MPIDR) and Portland State University asked, can Google search data predict whether COVID-related shutdowns will lead to a baby boom or bust?  In 2020, early in the pandemic, Wilde and team constructed a forecasting model based on volumes of Google searches with keywords related to conception, pregnancy, childbirth, and economic stability. Their thinking was that if searches increased sharply for keywords such as “pregnancy test” and “missed period,” one might expect higher birth rates seven to nine months later. On the other hand, prior research had associated unemployment with lower birth rates—so if unemployment-related searches climbed, one might expect a baby bust….(More)”.