Legal confusion threatens to slow data science

Simon Oxenham in Nature: “Knowledge from millions of biological studies encoded into one network — that is Daniel Himmelstein’s alluring description of Hetionet, a free online resource that melds data from 28 public sources on links between drugs, genes and diseases. But for a product built on public information, obtaining legal permissions has been surprisingly tough.

Menche rapidly gave consent — but not everyone was so helpful. One research group never replied to Himmelstein, and three replied without clearing up the legal confusion. Ultimately, Himmelstein published the final version of Hetionet in July — minus one data set whose licence forbids redistribution, but including the three that he still lacks clear permission to republish. The tangle shows that many researchers don’t understand that simply posting a data set publicly doesn’t mean others can legally republish it, says Himmelstein.

The confusion has the power to slow down science, he says, because researchers will be discouraged from combining data sets into more useful resources. It will also become increasingly problematic as scientists publish more information online. “Science is becoming more and more dependent on reusing data,” Himmelstein says….

Himmelstein is not convinced that he is legally in the clear — and feels that such ­uncertainty may deter other scientists from reproducing academic data. If a researcher launches a commercial product that is based on public data sets, he adds, the stakes of not having clear licensing are likely to rise. “I think these are largely untested waters, and most ­academics aren’t in the position to risk ­setting off a legal battle that will help clarify these issues,” he says….(More)”