Why you should never trust a data visualisation

in The Guardian: “An excellent blogpost has been receiving a lot of attention over the last week. Pete Warden, an experienced data scientist and author for O’Reilly on all things data, writes:

The wonderful thing about being a data scientist is that I get all of the credibility of genuine science, with none of the irritating peer review or reproducibility worries … I thought I was publishing an entertaining view of some data I’d extracted, but it was treated like a scientific study.

This is an important acknowledgement of a very real problem, but in my view Warden has the wrong target in his crosshairs. Data presented in any medium is a powerful tool and must be used responsibly, but it is when information is expressed visually that the risks are highest.
The central example Warden uses is his visualisation of Facebook friend networks across the United States, which proved extremely popular and was even cited in the New York Times as evidence for growing social division.
As he explains in his post, the methodology behind his underlying network graph is perfectly defensible, but the subsequent clustering process was “produced by me squinting at all the lines, coloring in some areas that seemed more connected in a paint program, and picking silly names for the areas”. The exercise was only ever intended as a bit of fun with a large and interesting dataset, so there really shouldn’t be any problem here.
But there is: humans are visual creatures. Peer-reviewed studies have shown that we can consume information more quickly when it is expressed in diagrams than when it is presented as text.
Even something as simple as colour scheme can have a marked impact on the perceived credibility of information presented visually – often a considerably more marked impact than the actual authority of the data source.
Another great example of this phenomenon was the Washington Post’s ‘map of the world’s most and least racially tolerant countries‘, which went viral back in May of this year. It was widely accepted as an objective, scientific piece of work, despite a number of social scientists identifying flaws in the methodology and the underlying data itself.”