Tackling quality concerns around (volunteered) big data

University of Twente: “… Improvements in online information communication and mobile location-aware technologies have led to a dramatic increase in the amount of volunteered geographic information (VGI) in recent years. The collection of volunteered data on geographic phenomena has a rich history worldwide. For example, the Christmas Bird Count has studied the impacts of climate change on spatial distribution and population trends of selected bird species in North America since 1900. Nowadays, several citizen observatories collect information about our environment. This information is complementary or, in some cases, essential to tackle a wide range of geographic problems.

Despite the wide applicability and acceptability of VGI in science, many studies argue that the quality of the observations remains a concern. Data collected by volunteers does not often follow scientific principles of sampling design, and levels of expertise vary among volunteers. This makes it hard for scientists to integrate VGI in their research.

Low quality, inconsistent, observations can bias analysis and modelling results because they are not representative for the variable studied, or because they decrease the ratio of signal to noise. Hence, the identification of inconsistent observations clearly benefits VGI-based applications and provide more robust datasets to the scientific community.

In their paper the researchers describe a novel automated workflow to identify inconsistencies in VGI. “Leveraging a digital control mechanism means we can give value to the millions of observations collected by volunteers” and “it allows a new kind of science where citizens can directly contribute to the analysis of global challenges like climate change” say Hamed Mehdipoor and Dr. Raul Zurita-Milla, who work at the Geo-Information Processing department of ITC….

While some inconsistent observations may reflect real, unusual events, the researchers demonstrated that these observations also bias the trends (advancement rates), in this case of the date of lilac flowering onset. This shows that identifying inconsistent observations is a pre-requisite for studying and interpreting the impact of climate change on the timing of life cycle events….(More)”