Blog by Jessica Pechmann: “…In Gaza, increased conflict since October 2023 has caused a prolonged humanitarian crisis. Understanding the impact of the conflict on buildings has been challenging, since pre-existing datasets from artificial intelligence and machine learning (AI/ML) models and OSM were not accurate enough to create a full building footprint baseline. The area’s buildings were too dense, and information on the ground was impossible to collect safely. In these hard-to-reach areas, HOT’s remote and crowdsourced mapping methodology was a good fit for collecting detailed information visible on aerial imagery.
In February 2024, after consultation with humanitarian and UN actors working in Gaza, HOT decided to create a pre-conflict dataset of all building footprints in the area in OSM. HOT’s community of OpenStreetMap volunteers did all the data work, coordinating through HOT’s Tasking Manager. The volunteers made meticulous edits to add missing data and to improve existing data. Due to protection and data quality concerns, only expert volunteer teams were assigned to map and validate the area. As in other areas that are hard to reach due to conflict, HOT balanced the data needs with responsible data practices based on the context.
Comparing AI/ML with human-verified OSM building datasets in conflict zones
AI/ML is becoming an increasingly common and quick way to obtain building footprints across large areas. Sources for automated building footprints range from worldwide datasets by Microsoft or Google to smaller-scale open community-managed tools such as HOT’s new application, fAIr.
Now that HOT volunteers have completely updated and validated all OSM buildings in visible imagery pre-conflict, OSM has 18% more individual buildings in the Gaza strip than Microsoft’s ML buildings dataset (estimated 330,079 buildings vs 280,112 buildings). However, in contexts where there has not been a coordinated update effort in OSM, the numbers may differ. For example, in Sudan where there has not been a large organized editing campaign, there are just under 1,500,000 in OSM, compared to over 5,820,000 buildings in Microsoft’s ML data. It is important to note that the ML datasets have not been human-verified and their accuracy is not known. Google Open Buildings has over 26 million building features in Sudan, but on visual inspection, many of these features are noise in the data that the model incorrectly identified as buildings in the uninhabited desert…(More)”.