When Ideology Drives Social Science


Article by Michael Jindra and Arthur Sakamoto: Last summer in these pages, Mordechai Levy-Eichel and Daniel Scheinerman uncovered a major flaw in Richard Jean So’s Redlining Culture: A Data History of Racial Inequality and Postwar Fiction, one that rendered the book’s conclusion null and void. Unfortunately, what they found was not an isolated incident. In complex areas like the study of racial inequality, a fundamentalism has taken hold that discourages sound methodology and the use of reliable evidence about the roots of social problems.

We are not talking about mere differences in interpretation of results, which are common. We are talking about mistakes so clear that they should cause research to be seriously questioned or even disregarded. A great deal of research — we will focus on examinations of Asian American class mobility — rigs its statistical methods in order to arrive at ideologically preferred conclusions.

Most sophisticated quantitative work in sociology involves multivariate research, often in a search for causes of social problems. This work might ask how a particular independent variable (e.g., education level) “causes” an outcome or dependent variable (e.g., income). Or it could study the reverse: How does parental income influence children’s education?

Human behavior is too complicated to be explained by only one variable, so social scientists typically try to “control” for various causes simultaneously. If you are trying to test for a particular cause, you want to isolate that cause and hold all other possible causes constant. One can control for a given variable using what is called multiple regression, a statistical tool that parcels out the separate net effects of several variables simultaneously.

If you want to determine whether income causes better education outcomes, you’d want to compare everyone from a two-parent family, since family status might be another causal factor, for instance. You’d also want to see the effect of family status by comparing everyone with similar incomes. And so on for other variables.

The problem is that there are potentially so many variables that a researcher inevitably leaves some out…(More)”.

Suspicion Machines


Lighthouse Reports: “Governments all over the world are experimenting with predictive algorithms in ways that are largely invisible to the public. What limited reporting there has been on this topic has largely focused on predictive policing and risk assessments in criminal justice systems. But there is an area where even more far-reaching experiments are underway on vulnerable populations with almost no scrutiny.

Fraud detection systems are widely deployed in welfare states ranging from complex machine learning models to crude spreadsheets. The scores they generate have potentially life-changing consequences for millions of people. Until now, public authorities have typically resisted calls for transparency, either by claiming that disclosure would increase the risk of fraud or to protect proprietary technology.

The sales pitch for these systems promises that they will recover millions of euros defrauded from the public purse. And the caricature of the benefit cheat is a modern take on the classic trope of the undeserving poor and much of the public debate in Europe — which has the most generous welfare states — is intensely politically charged.

The true extent of welfare fraud is routinely exaggerated by consulting firms, who are often the algorithm vendors, talking it up to near 5 percent of benefits spending while some national auditors’ offices estimate it at between 0.2 and 0.4 of spending. Distinguishing between honest mistakes and deliberate fraud in complex public systems is messy and hard.

When opaque technologies are deployed in search of political scapegoats the potential for harm among some of the poorest and most marginalised communities is significant.

Hundreds of thousands of people are being scored by these systems based on data mining operations where there has been scant public consultation. The consequences of being flagged by the “suspicion machine” can be drastic, with fraud controllers empowered to turn the lives of suspects inside out…(More)”.

The Expanding Use of Technology to Manage Migration


Report by ​Marti Flacks , Erol Yayboke , Lauren Burke and Anastasia Strouboulis: “Seeking to manage growing flows of migrants, the United States and European Union have dramatically expanded their engagement with migration origin and transit countries. This increasingly includes supporting the deployment of sophisticated technology to understand, monitor, and influence the movement of people across borders, expanding the spheres of interest to include the movement of people long before they reach U.S. and European borders.

This report from the CSIS Human Rights Initiative and CSIS Project on Fragility and Mobility examines two case studies of migration—one from Central America toward the United States and one from West and North Africa toward Europe—to map the use and export of migration management technologies and the associated human rights risks. Authors Marti Flacks, Erol Yayboke, Lauren Burke, and Anastasia Strouboulis provide recommendations for origin, transit, and destination governments on how to incorporate human rights considerations into their decisionmaking on the use of technology to manage migration…(More)”.

Toward a 21st Century National Data Infrastructure: Mobilizing Information for the Common Good


Report by National Academies of Sciences, Engineering, and Medicine: “Historically, the U.S. national data infrastructure has relied on the operations of the federal statistical system and the data assets that it holds. Throughout the 20th century, federal statistical agencies aggregated survey responses of households and businesses to produce information about the nation and diverse subpopulations. The statistics created from such surveys provide most of what people know about the well-being of society, including health, education, employment, safety, housing, and food security. The surveys also contribute to an infrastructure for empirical social- and economic-sciences research. Research using survey-response data, with strict privacy protections, led to important discoveries about the causes and consequences of important societal challenges and also informed policymakers. Like other infrastructure, people can easily take these essential statistics for granted. Only when they are threatened do people recognize the need to protect them…(More)”.

Foresight is a messy methodology but a marvellous mindset


Blog by Berta Mizsei: “…From my first few forays into foresight, it seemed that it employed desk research and expert workshops, but refrained from the use of data and from testing the solidity of assumptions. This can make scenarios weak and anecdotal, something experts justify by stating that scenarios are meant to be a ‘first step to start a discussion’.

The deficiencies of foresight became more evident when I took part in the process – so much of what ends up in imagined narratives depends on whether an expert was chatty during a workshop, or on the background of the expert writing the scenario.

As a young researcher coming from a quantitative background, this felt alien and alarming.

However, as it turns out, my issue was not with foresight per se, but rather with a certain way of doing it, one that is insufficiently grounded in sound research methods. In short, I am disturbed by ‘bad’ foresight. Foresight’s newly-found popularity means that there is more demand than supply for foresight experts, thus the prevalence of questionable foresight methodology has increased – something that was discussed during a dedicated session at this year’s Ideas Lab (CEPS’ flagship annual event).

One culprit is the Commission. Its foresight relies heavily on ‘backcasting’, a planning method that starts with a desirable future and works backwards to identify ways to achieve that outcome. One example is the 2022 Strategic Foresight Report ‘Twinning the green and digital transitions in the new geopolitical context’ that mapped out ways to get to the ideal future the Commission cabinet had imagined.

Is this useful? Undoubtedly.

However, it is also single-mindedly deterministic about the future of environmental policy, which is both notoriously complex and of critical importance to the current Commission. Similar hubris (or malpractice) is evident across various EU apparatuses – policymakers have a clear vision of what they want to happen and they invest into figuring out how to make that a reality without admitting how turbulent and unpredictable the future is. This is commendable and politically advantageous… but it is not foresight.

It misses one of foresight’s main virtues: forcing us to consider alternative futures…(More)”.

Americans Can’t Consent to Companies Use of their Data


A Report from the Annenberg School for Communication: “Consent has always been a central part of Americans’ interactions with the commercial internet. Federal and state laws, as well as decisions from the Federal Trade Commission (FTC), require either implicit (“opt out”) or explicit (“opt in”) permission from individuals for companies to take and use data about them. Genuine opt out and opt in consent requires that people have knowledge about commercial data-extraction practices as well as a belief they can do something about them. As we approach the 30th anniversary of the commercial internet, the latest Annenberg national survey finds that Americans have neither. High percentages of Americans don’t know, admit they don’t know, and believe they can’t do anything about basic practices and policies around companies’ use of people’s data…
High levels of frustration, concern, and fear compound Americans’ confusion: 80% say they have little control over how marketers can learn about them online; 80% agree that what companies know about them from their online behaviors can harm them. These and related discoveries from our survey paint a picture of an unschooled and admittedly incapable society that rejects the internet industry’s insistence that people will accept tradeoffs for benefits and despairs of its inability to predictably control its digital life in the face of powerful corporate forces. At a time when individual consent lies at the core of key legal frameworks governing the collection and use of personal information, our findings describe an environment where genuine consent may not be possible….The aim of this report is to chart the particulars of Americans’ lack of knowledge about the commercial use of their data and their “dark resignation” in connection to it. Our goal is also to raise questions and suggest solutions about public policies that allow companies to gather, analyze, trade, and otherwise benefit from information they extract from large populations of people who are uninformed about how that information will be used and deeply concerned about the consequences of its use. In short, we find that informed consent at scale is a myth, and we urge policymakers to act with that in mind.”…(More)”.

Ten lessons for data sharing with a data commons


Article by Robert L. Grossman: “..Lesson 1. Build a commons for a specific community with a specific set of research challenges

Although there are a few data repositories that serve the general scientific community that have proved successful, in general data commons that target a specific user community have proven to be the most successful. The first lesson is to build a data commons for a specific research community that is struggling to answer specific research challenges with data. As a consequence, a data commons is a partnership between the data scientists developing and supporting the commons and the disciplinary scientists with the research challenges.

Lesson 2. Successful commons curate and harmonize the data

Successful commons curate and harmonize the data and produce data products of broad interest to the community. It’s time consuming, expensive, and labor intensive to curate and harmonize data, by much of the value of data commons is centralizing this work so that it can be done once instead of many times by each group that needs the data. These days, it is very easy to think of a data commons as a platform containing data, not spend the time curating or harmonizing it, and then be surprised that the data in the commons is not used more widely used and its impact is not as high as expected.

Lesson 3. It’s ultimately about the data and its value to generate new research discoveries

Despite the importance of a study, few scientists will try to replicate previously published studies. Instead, data is usually accessed if it can lead to a new high impact paper. For this reason, data commons play two different but related roles. First, they preserve data for reproducible science. This is a small fraction of the data access, but plays a critical role in reproducible science. Second, data commons make data available for new high value science.

Lesson 4. Reduce barriers to access to increase usage

A useful rule of thumb is that every barrier to data access cuts down access by a factor of 10. Common barriers that reduce use of a commons include: registration vs no-registration; open access vs controlled access; click through agreements vs signing of data usage agreements and approval by data access committees; license restrictions on the use of the data vs no license restrictions…(More)”.

Health data justice: building new norms for health data governance


Paper by James Shaw & Sharifah Sekalala: “The retention and use of health-related data by government, corporate, and health professional actors risk exacerbating the harms of colonial systems of inequality in which health care and public health are situated, regardless of the intentions about how those data are used. In this context, a data justice perspective presents opportunities to develop new norms of health-related data governance that hold health justice as the primary objective. In this perspective, we define the concept of health data justice, outline urgent issues informed by this approach, and propose five calls to action from a health data justice perspective…(More)”.

Big data for whom? Data-driven estimates to prioritize the recovery needs of vulnerable populations after a disaster


Blog and paper by Sabine Loos and David Lallemant: “For years, international agencies have been effusing the benefits of big data for sustainable development. Emerging technology–such as crowdsourcing, satellite imagery, and machine learning–have the power to better inform decision-making, especially those that support the 17 Sustainable Development Goals. When a disaster occurs, overwhelming amounts of big data from emerging technology are produced with the intention to support disaster responders. We are seeing this now with the recent earthquakes in Turkey and Syria: space agencies are processing satellite imagery to map faults and building damage or digital humanitarians are crowdsourcing baseline data like roads and buildings.

Eight years ago, the Nepal 2015 earthquake was no exception–emergency managers received maps of shaking or crowdsourced maps of affected people’s needs from diverse sources. A year later, I began research with a team of folks involved during the response to the earthquake, and I was determined to understand how big data produced after disasters were connected to the long-term effects of the earthquake. Our research team found that a lot of data that was used to guide the recovery focused on building damage, which was often viewed as a proxy for population needs. While building damage information is useful, it does not capture the full array of social, environmental, and physical factors that will lead to disparities in long-term recovery. I assumed information would have been available immediately after the earthquake that was aimed at supporting vulnerable populations. However, as I spent time in Nepal during the years after the 2015 earthquake, speaking with government officials and nongovernmental organizations involved in the response and recovery, I found they lacked key information about the needs of the most vulnerable households–those who would face the greatest obstacles during the recovery from the earthquake. While governmental and nongovernmental actors prioritized the needs of vulnerable households as best as possible with the information available, I was inspired to pursue research that could provide better information more quickly after an earthquake, to inform recovery efforts.

In our paper published in Communications Earth and Environment [link], we develop a data-driven approach to rapidly estimate which areas are likely to fall behind during recovery due to physical, environmental, and social obstacles. This approach leverages survey data on recovery progress combined with geospatial datasets that would be readily available after an event that represent factors expected to impede recovery. To identify communities with disproportionate needs long after a disaster, we propose focusing on those who fall behind in recovery over time, or non-recovery. We focus on non-recovery since it places attention on those who do not recover rather than delineating the characteristics of successful recovery. In addition, in speaking to several groups in Nepal involved in the recovery, they understood vulnerability–a concept that is place-based and can change over time–as those who would not be able to recover due to the earthquake…(More)”

What is the role of public servants and policymakers in the battle against mis- and disinformation in our democratic systems?


Article by Elsa Pilichowski: “Recent health, economic and geopolitical crises have highlighted the urgency for governments to strengthen their capacity to respond to the spread of false and misleading information, while simultaneously building more resilient societies better prepared to handle crises. The challenges faced demand a whole-of-society-approach. 

First, governments should help citizens become more digitally literate so that they can identify false information before they spread it, intentionally or not. Increasing societal resilience also means supporting a diverse and independent media sector which can give voice to all viewpoints. Finally, new partnerships between civil society, the media, social media platforms and governments need to be built to help pre-bunk and de-bunk mis- and disinformation.

While not the ultimate actor in information provision, governments themselves will have to step up their capacities in the information space by strengthening inter-agency coordination mechanisms, developing innovative strategies and tools, and working with international partners to build knowledge of the origins and pathways of mis- and disinformation. Another specific avenue is to help ensure the role of public communication in reinforcing an information space conducive to democracy. Breaking down internal silos to facilitate collaboration; building partnerships with external stakeholders like fact-checkers; and focusing on efforts to reach all segments of society with accurate information will all be important.

Regulatory responses that help establish effective transparency frameworks around content moderation processes and decisions, build understanding of the role of algorithms in the spread of mis- and disinformation and promote a fairer and more responsible business environment are all key priorities. Such constructive and process-based regulation is all the more critical to safeguard against government interference in the free flow of information and impingement upon one of the foundational values of democracy—the right to free and open speech…(More)”