Analyzing Big Data on a Shoestring Budget


Article by Toshiko Kaneda and Lori S. Ashford: “Big data has opened a new world for demographers and public health scientists to explore, to gain insights into social and health phenomena using the myriad digital traces we leave behind in our daily lives. But is analyzing big data practical and affordable? Researchers and organizations who have not made the leap might wonder: Do we need a lot more funding? Supercomputers? Armies of data scientists?

Three studies, presented recently in a PRB Demography Talk, show the feasibility of conducting research on a proverbial shoestring—using big data that are publicly, freely available to anyone with a personal computer and Wi-Fi connection.

Study 1: Can Google data help measure health care access more accurately?

The first study, presented by Luis Gabriel Cuervo of the Universitat Autònoma de Barcelona and the AMORE project, used Google mobility data to assess the effect of traffic congestion on people’s ability to access health services in Cali, Colombia, a city of 2.3 million. The study aimed to improve how health care accessibility is measured and communicated, to inform urban and health services planning.

Cuervo assembled a multidisciplinary research team, including mobility experts, to examine travel times from where people live to urgent and frequently used health services. The team used Google’s Distance Matrix API, which provides travel times and distance between origins and destinations, accounting for changing traffic conditions. The data are generated from Google Maps on people’s cell phones.

Combining this information with census and health services data, the study measured travel times repeatedly and revealed significant inequality by sociodemographic characteristics. On typical days, 60% of the city’s population lived more than 15 minutes by car from emergency care, with those in the poorest neighborhoods facing the longest travel times and a greater impact from traffic congestion.

Studies 2 and 3: Can Google data help predict changes in birth rates and examine excess deaths from COVID-19 related shutdowns?

In another study, Joshua Wilde from the Max Planck Institute for Demographic Research (MPIDR) and Portland State University asked, can Google search data predict whether COVID-related shutdowns will lead to a baby boom or bust?  In 2020, early in the pandemic, Wilde and team constructed a forecasting model based on volumes of Google searches with keywords related to conception, pregnancy, childbirth, and economic stability. Their thinking was that if searches increased sharply for keywords such as “pregnancy test” and “missed period,” one might expect higher birth rates seven to nine months later. On the other hand, prior research had associated unemployment with lower birth rates—so if unemployment-related searches climbed, one might expect a baby bust….(More)”.

The Right To Be Free From Automation


Essay by Ziyaad Bhorat: “Is it possible to free ourselves from automation? The idea sounds fanciful, if not outright absurd. Industrial and technological development have reached a planetary level, and automation, as the general substitution or augmentation of human work with artificial tools capable of completing tasks on their own, is the bedrock of all the technologies designed to save, assist and connect us. 

From industrial lathes to OpenAI’s ChatGPT, automation is one of the most groundbreaking achievements in the history of humanity. As a consequence of the human ingenuity and imagination involved in automating our tools, the sky is quite literally no longer a limit. 

But in thinking about our relationship to automation in contemporary life, my unease has grown. And I’m not alone — America’s Blueprint for an AI Bill of Rights and the European Union’s GDPR both express skepticism of automated tools and systems: The “use of technology, data and automated systems in ways that threaten the rights of the American public”; the “right not to be subject to a decision based solely on automated processing.” 

If we look a little deeper, we find this uneasy language in other places where people have been guarding three important abilities against automated technologies. Historically, we have found these abilities so important that we now include them in various contemporary rights frameworks: the right to work, the right to know and understand the source of the things we consume, and the right to make our own decisions. Whether we like it or not, therefore, communities and individuals are already asserting the importance of protecting people from the ubiquity of automated tools and systems.

Consider the case of one of South Africa’s largest retailers, Pick n Pay, which in 2016 tried to introduce self-checkout technology in its retail stores. In post-Apartheid South Africa, trade unions are immensely powerful and unemployment persistently high, so any retail firm that wants to introduce technology that might affect the demand for labor faces huge challenges. After the country’s largest union federation threatened to boycott the new Pick n Pay machines, the company scrapped its pilot. 

As the sociologist Christopher Andrews writes in “The Overworked Consumer,” self-checkout technology is by no means a universally good thing. Firms that introduce it need to deal with new forms of theft, maintenance and bottleneck, while customers end up doing more work themselves. These issues are in addition to the ill fortunes of displaced workers…(More)”.

How Democracy Can Win


Essay by Samantha Power: “…At the core of democratic theory and practice is respect for the dignity of the individual. But among the biggest errors many democracies have made since the Cold War is to view individual dignity primarily through the prism of political freedom without being sufficiently attentive to the indignity of corruption, inequality, and a lack of economic opportunity.

This was not a universal blind spot: a number of political figures, advocates, and individuals working at the grassroots level to advance democratic progress presciently argued that economic inequality could fuel the rise of populist leaders and autocratic governments that pledged to improve living standards even as they eroded freedoms. But too often, the activists, lawyers, and other members of civil society who worked to strengthen democratic institutions and protect civil liberties looked to labor movements, economists, and policymakers to address economic dislocation, wealth inequality, and declining wages rather than building coalitions to tackle these intersecting problems.

Democracy suffered as a result. Over the past two decades,as economic inequality rose, polls showed that people in rich and poor countries alike began to lose faith in democracy and worry that young people would end up worse off than they were, giving populists and ethno­nationalists an opening to exploit grievances and gain a political foothold on every continent.

Moving forward, we must look at all economic programming that respects democratic norms as a form of democracy assistance. When we help democratic leaders provide vaccines to their people, bring down inflation or high food prices, send children to school, or reopen markets after a natural disaster, we are demonstrating—in a way that a free press or vibrant civil society cannot always do—that democracy delivers. And we are making it less likely that autocratic forces will take advantage of people’s economic hardship.

Nowhere is that task more important today than in societies that have managed to elect democratic reformers or throw off autocratic or antidemocratic rule through peaceful mass protests or successful political movements. These democratic bright spots are incredibly fragile. Unless reformers solidify their democratic and economic gains quickly, populations understandably grow impatient, especially if they feel that the risks they took to upend the old order have not yielded tangible dividends in their own lives. Such discontent allows opponents of democratic rule—often aided by external autocratic regimes—to wrest back control, reversing reforms and snuffing out dreams of rights-regarding self-government…(More)”.

Your Data Is Diminishing Your Freedom


Interview by David Marchese: “It’s no secret — even if it hasn’t yet been clearly or widely articulated — that our lives and our data are increasingly intertwined, almost indistinguishable. To be able to function in modern society is to submit to demands for ID numbers, for financial information, for filling out digital fields and drop-down boxes with our demographic details. Such submission, in all senses of the word, can push our lives in very particular and often troubling directions. It’s only recently, though, that I’ve seen someone try to work through the deeper implications of what happens when our data — and the formats it’s required to fit — become an inextricable part of our existence, like a new limb or organ to which we must adapt. ‘‘I don’t want to claim we are only data and nothing but data,’’ says Colin Koopman, chairman of the philosophy department at the University of Oregon and the author of ‘‘How We Became Our Data.’’ ‘‘My claim is you are your data, too.’’ Which at the very least means we should be thinking about this transformation beyond the most obvious data-security concerns. ‘‘We’re strikingly lackadaisical,’’ says Koopman, who is working on a follow-up book, tentatively titled ‘‘Data Equals,’’ ‘‘about how much attention we give to: What are these data showing? What assumptions are built into configuring data in a given way? What inequalities are baked into these data systems? We need to be doing more work on this.’’

Can you explain more what it means to say that we have become our data? Because a natural reaction to that might be, well, no, I’m my mind, I’m my body, I’m not numbers in a database — even if I understand that those numbers in that database have real bearing on my life. The claim that we are data can also be taken as a claim that we live our lives through our data in addition to living our lives through our bodies, through our minds, through whatever else. I like to take a historical perspective on this. If you wind the clock back a couple hundred years or go to certain communities, the pushback wouldn’t be, ‘‘I’m my body,’’ the pushback would be, ‘‘I’m my soul.’’ We have these evolving perceptions of our self. I don’t want to deny anybody that, yeah, you are your soul. My claim is that your data has become something that is increasingly inescapable and certainly inescapable in the sense of being obligatory for your average person living out their life. There’s so much of our lives that are woven through or made possible by various data points that we accumulate around ourselves — and that’s interesting and concerning. It now becomes possible to say: ‘‘These data points are essential to who I am. I need to tend to them, and I feel overwhelmed by them. I feel like it’s being manipulated beyond my control.’’ A lot of people have that relationship to their credit score, for example. It’s both very important to them and very mysterious…(More)”.

The Sensitive Politics Of Information For Digital States


Essay by Federica Carugati, Cyanne E. Loyle and Jessica Steinberg: “In 2020, Vice revealed that the U.S. military had signed a contract with Babel Street, a Virginia-based company that created a product called Locate X, which collects location data from users across a variety of digital applications. Some of these apps are seemingly innocuous: one for following storms, a Muslim dating app and a level for DIY home repair. Less innocuously, these reports indicate that the U.S. government is outsourcing some of its counterterrorism and counterinsurgency information-gathering activities to a private company.

While states have always collected information about citizens and their activities, advances in digital technologies — including new kinds of data and infrastructure — have fundamentally altered their ability to access, gather and analyze information. Bargaining with and relying on non-state actors like private companies creates tradeoffs between a state’s effectiveness and legitimacy. Those tradeoffs might be unacceptable to citizens, undermining our very understanding of what states do and how we should interact with them …(More)”

The Statistics That Come Out of Nowhere


Article by Ray Fisman, Andrew Gelman, and Matthew C. Stephenson: “This winter, the university where one of us works sent out an email urging employees to wear a hat on particularly cold days because “most body heat is lost through the top of the head.” Many people we know have childhood memories of a specific figure—perhaps 50 percent or, by some accounts, 80 percent of the heat you lose is through your head. But neither figure is scientific: One is flawed, and the other is patently wrong. A 2004 New York Times column debunking the claim traced its origin to a U.S. military study from the 1950s in which people dressed in neck-high Arctic-survival suits were sent out into the cold. Participants lost about half of their heat through the only part of their body that was exposed to the elements. Exaggeration by generations of parents got us up to 80 percent. (According to a hypothermia expert cited by the Times, a more accurate figure is 10 percent.)

This rather trivial piece of medical folklore is an example of a more serious problem: Through endless repetition, numbers of dubious origin take on the veneer of scientific fact, in many cases in the context of vital public-policy debates. Unreliable numbers are always just an internet search away, and serious people and institutions depend on and repeat seemingly precise quantitative measurements that turn out to have no reliable support…(More)”.

The big idea: should governments run more experiments?


Article by Stian Westlake: “…Conceived in haste in the early days of the pandemic, Recovery (which stands for Randomised Evaluation of Covid-19 Therapy) sought to find drugs to help treat people seriously ill with the novel disease. It brought together epidemiologists, statisticians and health workers to test a range of promising existing drugs at massive scale across the NHS.

The secret of Recovery’s success is that it was a series of large, fast, randomised experiments, designed to be as easy as possible for doctors and nurses to administer in the midst of a medical emergency. And it worked wonders: within three months, it had demonstrated that dexamethasone, a cheap and widely available steroid, reduced Covid deaths by a fifth to a third. In the months that followed, Recovery identified four more effective drugs, and along the way showed that various popular treatments, including hydroxychloroquine, President Trump’s tonic of choice, were useless. All in all, it is thought that Recovery saved a million lives around the world, and it’s still going.

But Recovery’s incredible success should prompt us to ask a more challenging question: why don’t we do this more often? The question of which drugs to use was far from the only unknown we had to navigate in the early days of the pandemic. Consider the decision to delay second doses of the vaccine, when to close schools, or the right regime for Covid testing. In each case, the UK took a calculated risk and hoped for the best. But as the Royal Statistical Society pointed out at the time, it would have been cheap and quick to undertake trials so we could know for sure what the right choice was, and then double down on it.

There is a growing movement to apply randomised trials not just in healthcare but in other things government does. ..(More)”.

LocalView, a database of public meetings for the study of local politics and policy-making in the United State


Paper by Soubhik Barari and Tyler Simko: “Despite the fundamental importance of American local governments for service provision in areas like education and public health, local policy-making remains difficult and expensive to study at scale due to a lack of centralized data. This article introduces LocalView, the largest existing dataset of real-time local government public meetings–the central policy-making process in local government. In sum, the dataset currently covers 139,616 videos and their corresponding textual and audio transcripts of local government meetings publicly uploaded to YouTube–the world’s largest public video-sharing website– from 1,012 places and 2,861 distinct governments across the United States between 2006–2022. The data are processed, downloaded, cleaned, and publicly disseminated (at localview.net) for analysis across places and over time. We validate this dataset using a variety of methods and demonstrate how it can be used to map local governments’ attention to policy areas of interest. Finally, we discuss how LocalView may be used by journalists, academics, and other users for understanding how local communities deliberate crucial policy questions on topics including climate change, public health, and immigration…(More)”.

It’s Time to Rethink the Idea of “Indigenous” 


Essay by Manvir Singh: “Identity evolves. Social categories shrink or expand, become stiffer or more elastic, more specific or more abstract. What it means to be white or Black, Indian or American, able-bodied or not shifts as we tussle over language, as new groups take on those labels and others strip them away.

On August 3, 1989, the Indigenous identity evolved. Moringe ole Parkipuny, a Maasai activist and a former member of the Tanzanian Parliament, spoke before the U.N. Working Group on Indigenous Populations, in Geneva—the first African ever to do so. “Our cultures and way of life are viewed as outmoded, inimical to national pride, and a hindrance to progress,” he said. As a result, pastoralists like the Maasai, along with hunter-gatherers, “suffer from common problems which characterize the plight of indigenous peoples throughout the world. The most fundamental rights to maintain our specific cultural identity and the land that constitutes the foundation of our existence as a people are not respected by the state and fellow citizens who belong to the mainstream population.”

Parkipuny’s speech was the culmination of an astonishing ascent. Born in a remote village near Tanzania’s Rift Valley, he attended school after British authorities demanded that each family “contribute” a son to be educated. His grandfather urged him to flunk out, but he refused. “I already had a sense of how Maasai were being treated,” he told the anthropologist Dorothy Hodgson in 2005. “I decided I must go on.” He eventually earned an M.A. in development studies from the University of Dar es Salaam.

In his master’s thesis, Parkipuny condemned the Masai Range Project, a twenty-million-dollar scheme funded by the U.S. Agency for International Development to boost livestock productivity. Naturally, then, U.S.A.I.D. was resistant when the Tanzanian government hired him to join the project. In the end, he was sent to the United States to learn about “proper ranches.” He travelled around until, one day, a Navajo man invited him to visit the Navajo Nation, the reservation in the Southwest.

“I stayed with them for two weeks, and then with the Hopi for two weeks,” he told Hodgson. “It was my first introduction to the indigenous world. I was struck by the similarities of our problems.” The disrepair of the roads reminded him of the poor condition of cattle trails in Maasailand…

By the time Parkipuny showed up in Geneva, the concept of “indigenous” had already undergone major transformations. The word—from the Latin indigena, meaning “native” or “sprung from the land”—has been used in English since at least 1588, when a diplomat referred to Samoyed peoples in Siberia as “Indigenæ, or people bred upon that very soyle.” Like “native,” “indigenous” was used not just for people but for flora and fauna as well, suffusing the term with an air of wildness and detaching it from history and civilization. The racial flavor intensified during the colonial period until, again like “native,” “indigenous” served as a partition, distinguishing white settlers—and, in many cases, their slaves—from the non-Europeans who occupied lands before them….When Parkipuny showed up in Geneva, activists were consciously remodelling indigeneity to encompass marginalized peoples worldwide, including, with Parkipuny’s help, in Africa.

Today, nearly half a billion people qualify as Indigenous…(More)”.

When Ideology Drives Social Science


Article by Michael Jindra and Arthur Sakamoto: Last summer in these pages, Mordechai Levy-Eichel and Daniel Scheinerman uncovered a major flaw in Richard Jean So’s Redlining Culture: A Data History of Racial Inequality and Postwar Fiction, one that rendered the book’s conclusion null and void. Unfortunately, what they found was not an isolated incident. In complex areas like the study of racial inequality, a fundamentalism has taken hold that discourages sound methodology and the use of reliable evidence about the roots of social problems.

We are not talking about mere differences in interpretation of results, which are common. We are talking about mistakes so clear that they should cause research to be seriously questioned or even disregarded. A great deal of research — we will focus on examinations of Asian American class mobility — rigs its statistical methods in order to arrive at ideologically preferred conclusions.

Most sophisticated quantitative work in sociology involves multivariate research, often in a search for causes of social problems. This work might ask how a particular independent variable (e.g., education level) “causes” an outcome or dependent variable (e.g., income). Or it could study the reverse: How does parental income influence children’s education?

Human behavior is too complicated to be explained by only one variable, so social scientists typically try to “control” for various causes simultaneously. If you are trying to test for a particular cause, you want to isolate that cause and hold all other possible causes constant. One can control for a given variable using what is called multiple regression, a statistical tool that parcels out the separate net effects of several variables simultaneously.

If you want to determine whether income causes better education outcomes, you’d want to compare everyone from a two-parent family, since family status might be another causal factor, for instance. You’d also want to see the effect of family status by comparing everyone with similar incomes. And so on for other variables.

The problem is that there are potentially so many variables that a researcher inevitably leaves some out…(More)”.