Beware of the gaps in Big Data


Edd Gent at E&T: “When the municipal authority in charge of Boston, Massachusetts, was looking for a smarter way to find which roads it needed to repair, it hit on the idea of crowdsourcing the data. The authority released a mobile app called Street Bump in 2011 that employed an elegantly simple idea: use a smartphone’s accelerometer to detect jolts as cars go over potholes and look up the location using the Global Positioning System. But the approach ran into a pothole of its own.The system reported a disproportionate number of potholes in wealthier neighbourhoods. It turned out it was oversampling the younger, more affluent citizens who were digitally clued up enough to download and use the app in the first place. The city reacted quickly, but the incident shows how easy it is to develop a system that can handle large quantities of data but which, through its own design, is still unlikely to have enough data to work as planned.

As we entrust more of our lives to big data analytics, automation problems like this could become increasingly common, with their errors difficult to spot after the fact. Systems that ‘feel like they work’ are where the trouble starts.

Harvard University professor Gary King, who is also founder of social media analytics company Crimson Hexagon, recalls a project that used social media to predict unemployment. The model was built by correlating US unemployment figures with the frequency that people used words like ‘jobs’, ‘unemployment’ and ‘classifieds’. A sudden spike convinced researchers they had predicted a big rise in joblessness, but it turned out Steve Jobs had died and their model was simply picking up posts with his name. “This was an example of really bad analytics and it’s even worse because it’s the kind of thing that feels like it should work and does work a little bit,” says King.

Big data can shed light on areas with historic information deficits, and systems that seem to automatically highlight the best course of action can be seductive for executives and officials. “In the vacuum of no decision any decision is attractive,” says Jim Adler, head of data at Toyota Research Institute in Palo Alto. “Policymakers will say, ‘there’s a decision here let’s take it’, without really looking at what led to it. Was the data trustworthy, clean?”…(More)”