Science is best when the data is an open book

 at the Conversation: “It was 1986, and the American space agency, NASA, was reeling from the loss of seven lives. The space shuttle Challenger had broken apart about one minute after its launch.

A Congressional commission was formed to report on the tragedy. The physicist Richard Feynman was one of its members.

NASA officials had testified to Congress that the chance of a shuttle failure was around 1 in 100,000. Feynman wanted to look beyond the official testimony to the numbers and data that backed it up.

After completing his investigation, Feynman summed up his findings in an appendix to the Commission’s official report, in which he declaredthat NASA officials had “fooled themselves” into thinking that the shuttle was safe.

After a launch, shuttle parts sometimes came back damaged or behaved in unexpected ways. In many of those cases, NASA came up with convenient explanations that minimised the importance of these red flags. The people at NASA badly wanted the shuttle to be safe, and this coloured their reasoning.

To Feynman, this sort of behaviour was not surprising. In his career as a physicist, Feynman had observed that not just engineers and managers, but also basic scientists have biases that can lead to self-deception.

Feynman believed that scientists should constantly remind themselves of their biases. “The first principle” of being a good researcher, according to Feynman, “is that you must not fool yourself, and you are the easiest person to fool”….In the official report to Congress, Feynman and his colleagues recommended an independent oversight group be established to provide a continuing analysis of risk that was less biased than could be provided by NASA itself. The agency needed input from people who didn’t have a stake in the shuttle being safe.

Individual scientists also need that kind of input. The system of science ought to be set up in such a way that researchers subscribing to different theories can give independent interpretations of the same data set.

This would help protect the scientific community from the tendency for individuals to fool themselves into seeing support for their theory that isn’t there.

To me it’s clear: researchers should routinely examine others’ raw data. But in many fields today there is no opportunity to do so.

Scientists communicate their findings to each other via journal articles. These articles provide summaries of the data, often with a good deal of detail, but in many fields the raw numbers aren’t shared. And the summaries can be artfully arranged to conceal contradictions and maximise the apparent support for the author’s theory.

Occasionally, an article is true to the data behind it, showing the warts and all. But we shouldn’t count on it. As the chemist Matthew Todd has said to me, that would be like expecting a real estate agent’s brochure for a property to show the property’s flaws. You wouldn’t buy a house without seeing it with your own eyes. It can be unwise to buy into a theory without seeing the unfiltered data.

Many scientific societies recognise this. For many years now, some of the journals they oversee have had a policy of requiring authors to provide the raw data when other researchers request it.

Unfortunately, this policy has failed spectacularly, at least in some areas of science. Studies have found that when one researcher requests the data behind an article, that article’s authors respond with the data in fewer than half of cases. This is a major deficiency in the system of science, an embarrassment really.

The well-intentioned policy of requiring that data be provided upon request has turned out to be a formula for unanswered emails, for excuses, and for delays. A data before request policy, however, can be effective.

A few journals have implemented this, requiring that data be posted online upon publication of the article…(More)”