Big Data Coming In Faster Than Biomedical Researchers Can Process It

Richard Harris at NPR: “Biomedical research is going big-time: Megaprojects that collect vast stores of data are proliferating rapidly. But scientists’ ability to make sense of all that information isn’t keeping up.

This conundrum took center stage at a meeting of patient advocates, called Partnering For Cures, in New York City on Nov. 15.

On the one hand, there’s an embarrassment of riches, as billions of dollars are spent on these megaprojects.

There’s the White House’s Cancer Moonshot (which seeks to make 10 years of progress in cancer research over the next five years), the Precision Medicine Initiative (which is trying to recruit a million Americans to glean hints about health and disease from their data), The BRAIN Initiative (to map the neural circuits and understand the mechanics of thought and memory) and the International Human Cell Atlas Initiative (to identify and describe all human cell types).

“It’s not just that any one data repository is growing exponentially, the number of data repositories is growing exponentially,” said Dr. Atul Butte, who leads the Institute for Computational Health Sciences at the University of California, San Francisco.

One of the most remarkable efforts is the federal government’s push to get doctors and hospitals to put medical records in digital form. That shift to electronic records is costing billions of dollars — including more than $28 billion alone in federal incentives to hospitals, doctors and others to adopt them. The investment is creating a vast data repository that could potentially be mined for clues about health and disease, the way websites and merchants gather data about you to personalize the online ads you see and for other commercial purposes.

But, unlike the data scientists at Google and Facebook, medical researchers have done almost nothing as yet to systematically analyze the information in these records, Butte said. “As a country, I think we’re investing close to zero analyzing any of that data,” he said.

Prospecting for hints about health and disease isn’t going to be easy. The raw data aren’t very robust and reliable. Electronic medical records are often kept in databases that aren’t compatible with one another, at least without a struggle. Some of the potentially revealing details are also kept as free-form notes, which can be hard to extract and interpret. Errors commonly creep into these records….(More)”