Businesses dig for treasure in open data

Lindsay Clark in ComputerWeekly: “Open data, a movement which promises access to vast swaths of information held by public bodies, has started getting its hands dirty, or rather its feet.
Before a spade goes in the ground, construction and civil engineering projects face a great unknown: what is down there? In the UK, should someone discover anything of archaeological importance, a project can be halted – sometimes for months – while researchers study the site and remove artefacts….
During an open innovation day hosted by the Science and Technologies Facilities Council (STFC), open data services and technology firm Democrata proposed analytics could predict the likelihood of unearthing an archaeological find in any given location. This would help developers understand the likely risks to construction and would assist archaeologists in targeting digs more accurately. The idea was inspired by a presentation from the Archaeological Data Service in the UK at the event in June 2014.
The proposal won support from the STFC which, together with IBM, provided a nine-strong development team and access to the Hartree Centre’s supercomputer – a 131,000 core high-performance facility. For natural language processing of historic documents, the system uses two components of IBM’s Watson – the AI service which famously won the US TV quiz show Jeopardy. The system uses SPSS modelling software, the language R for algorithm development and Hadoop data repositories….
The proof of concept draws together data from the University of York’s archaeological data, the Department of the Environment, English Heritage, Scottish Natural Heritage, Ordnance Survey, Forestry Commission, Office for National Statistics, the Land Registry and others….The system analyses sets of indicators of archaeology, including historic population dispersal trends, specific geology, flora and fauna considerations, as well as proximity to a water source, a trail or road, standing stones and other archaeological sites. Earlier studies created a list of 45 indicators which was whittled down to seven for the proof of concept. The team used logistic regression to assess the relationship between input variables and come up with its prediction….”