Using Wikipedia for conflict forecasting


Article by Christian Oswald and Daniel Ohrenhofer: “How can we improve our ability to predict conflicts? Scholars have struggled with this question for a long time. However, as a discipline, and especially over the last two decades, political science has made substantial progress. In general, what we need to improve predictions are advances in data and methodology. Data advances involve both improving the quality of existing data and developing new data sources. We propose a new data source for conflict forecasting efforts: Wikipedia.

The number of country page views indicates international salience of, or interest in, a country. Meanwhile, the number of changes to a country page indicate political controversy between opposing political views.

We took part in the Violence Early-Warning System’s friendly competition to predict changes in battle-related deaths. In our work, we evaluate our findings with out-of-sample predictions using held-out, previously unseen data, and true forecasts into the future. We find support for the predictive power of country page views, whereas we do not for page changes…

Globally available data, updated monthly, are ideal for (near) real-time forecasting. However, many commonly used data sources are available only annually. They are updated once a year, often with considerable delay.

Some of these variables, such as democracy or GDP, tend to be relatively static over time. Furthermore, many data sources face the problem of missing values. These occur when it is not possible to find reliable data for a variable for a given country.

Wikipedia is updated in real time, unlike many commonly used data sources, which may update only annually and with considerable delay

More recent data sources such as Twitter, images or text as data, or mobile phone data, often do not provide global coverage. What’s more, collecting and manipulating data from such sources is typically computationally and/or financially costly. Wikipedia provides an alternative data source that, to some extent, overcomes many of these limitations…(More)”.