Researcher Helps Create Big Data ‘Early Alarm’ for Ukraine Abuses


Article by Chris Carroll: From searing images of civilians targeted by shelling to detailed accounts of sick children and their families fleeing nearby fighting to seek medical care, journalists have created a kaleidoscopic view of the suffering that has engulfed Ukraine since Russia invaded—but the news media can’t be everywhere.

Social media practically can be, however, and a University of Maryland researcher is part of a U.S.-Ukrainian multi-institutional team that’s harvesting data from Twitter and analyzing it with machine-learning algorithms. The result is a real-time system that provides a running account of what people in Ukraine are facing, constructed from their own accounts.

The project, Data for Ukraine, has been running for about three weeks, and has shown itself able to surface important events a few hours ahead of Western or even Ukrainian media sources. It focuses on four areas: humanitarian needs, displaced people, civilian resistance and human rights violations. In addition to simply showing spikes of credible tweets about certain subjects the team is tracking, the system also geolocates tweets—essentially mapping where events take place.

“It’s an early alarm system for human rights abuses,” said Ernesto Calvo, professor of government and politics and director of UMD’s Inter-Disciplinary Lab for Computational Social Science. “For it to work, we need to know two basic things: what is happening or being reported, and who is reporting those things.”

Calvo and his lab focus on the second of those two requirements, and constructed a “community detection” system to identify key nodes of Twitter users from which to use data. Other team members with expertise in Ukrainian society and politics spotted him a list of about 400 verified users who actively tweet on relevant topics. Then Calvo, who honed his approach analyzing social media from political and environmental crises in Latin America, and his team expanded and deepened the collection, drawing on connections and followers of the initial list so that millions of tweets per day now feed the system.

Nearly half of the captured tweets are in Ukrainian, 30% are in English and 20% are in Russian. Knowing who to exclude—accounts started the day before the invasion, for instance, or with few long-term connections—is key, Calvo said…(More)”.