What’s wrong with big data?


James Bridle in the New Humanist: “In a 2008 article in Wired magazine entitled “The End of Theory”, Chris Anderson argued that the vast amounts of data now available to researchers made the traditional scientific process obsolete. No longer would they need to build models of the world and test them against sampled data. Instead, the complexities of huge and totalising datasets would be processed by immense computing clusters to produce truth itself: “With enough data, the numbers speak for themselves.” As an example, Anderson cited Google’s translation algorithms which, with no knowledge of the underlying structures of languages, were capable of inferring the relationship between them using extensive corpora of translated texts. He extended this approach to genomics, neurology and physics, where scientists are increasingly turning to massive computation to make sense of the volumes of information they have gathered about complex systems. In the age of big data, he argued, “Correlation is enough. We can stop looking for models.”

This belief in the power of data, of technology untrammelled by petty human worldviews, is the practical cousin of more metaphysical assertions. A belief in the unquestionability of data leads directly to a belief in the truth of data-derived assertions. And if data contains truth, then it will, without moral intervention, produce better outcomes. Speaking at Google’s private London Zeitgeist conference in 2013, Eric Schmidt, Google Chairman, asserted that “if they had had cellphones in Rwanda in 1994, the genocide would not have happened.” Schmidt’s claim was that technological visibility – the rendering of events and actions legible to everyone – would change the character of those actions. Not only is this statement historically inaccurate (there was plenty of evidence available of what was occurring during the genocide from UN officials, US satellite photographs and other sources), it’s also demonstrably untrue. Analysis of unrest in Kenya in 2007, when over 1,000 people were killed in ethnic conflicts, showed that mobile phones not only spread but accelerated the violence. But you don’t need to look to such extreme examples to see how a belief in technological determinism underlies much of our thinking and reasoning about the world.

“Big data” is not merely a business buzzword, but a way of seeing the world. Driven by technology, markets and politics, it has come to determine much of our thinking, but it is flawed and dangerous. It runs counter to our actual findings when we employ such technologies honestly and with the full understanding of their workings and capabilities. This over-reliance on data, which I call “quantified thinking”, has come to undermine our ability to reason meaningfully about the world, and its effects can be seen across multiple domains.

The assertion is hardly new. Writing in the Dialectic of Enlightenment in 1947, Theodor Adorno and Max Horkheimer decried “the present triumph of the factual mentality” – the predecessor to quantified thinking – and succinctly analysed the big data fallacy, set out by Anderson above. “It does not work by images or concepts, by the fortunate insights, but refers to method, the exploitation of others’ work, and capital … What men want to learn from nature is how to use it in order wholly to dominate it and other men. That is the only aim.” What is different in our own time is that we have built a world-spanning network of communication and computation to test this assertion. While it occasionally engenders entirely new forms of behaviour and interaction, the network most often shows to us with startling clarity the relationships and tendencies which have been latent or occluded until now. In the face of the increased standardisation of knowledge, it becomes harder and harder to argue against quantified thinking, because the advances of technology have been conjoined with the scientific method and social progress. But as I hope to show, technology ultimately reveals its limitations….

“Eroom’s law” – Moore’s law backwards – was recently formulated to describe a problem in pharmacology. Drug discovery has been getting more expensive. Since the 1950s the number of drugs approved for use in human patients per billion US dollars spent on research and development has halved every nine years. This problem has long perplexed researchers. According to the principles of technological growth, the trend should be in the opposite direction. In a 2012 paper in Nature entitled “Diagnosing the decline in pharmaceutical R&D efficiency” the authors propose and investigate several possible causes for this. They begin with social and physical influences, such as increased regulation, increased expectations and the exhaustion of easy targets (the “low hanging fruit” problem). Each of these are – with qualifications – disposed of, leaving open the question of the discovery process itself….(More)