The New Thing in Google Flu Trends Is Traditional Data

in the New York Times: “Google is giving its Flu Trends service an overhaul — “a brand new engine,” as it announced in a blog post on Friday.

The new thing is actually traditional data from the Centers for Disease Control and Prevention that is being integrated into the Google flu-tracking model. The goal is greater accuracy after the Google service had been criticized for consistently over-estimating flu outbreaks in recent years.

The main critique came in an analysis done by four quantitative social scientists, published earlier this year in an article in Science magazine, “The Parable of Google Flu: Traps in Big Data Analysis.” The researchers found that the most accurate flu predictor was a data mash-up that combined Google Flu Trends, which monitored flu-related search terms, with the official C.D.C. reports from doctors on influenza-like illness.

The Google Flu Trends team is heeding that advice. In the blog post, written by Christian Stefansen, a Google senior software engineer, wrote, “We’re launching a new Flu Trends model in the United States that — like many of the best performing methods in the literature — takes official CDC flu data into account as the flu season progresses.”

Google’s flu-tracking service has had its ups and downs. Its triumph came in 2009, when it gave an advance signal of the severity of the H1N1 outbreak, two weeks or so ahead of official statistics. In a 2009 article in Nature explaining how Google Flu Trends worked, the company’s researchers did, as the Friday post notes, say that the Google service was not intended to replace official flu surveillance methods and that it was susceptible to “false alerts” — anything that might prompt a surge in flu-related search queries.

Yet those caveats came a couple of pages into the Nature article. And Google Flu Trends became a symbol of the superiority of the new, big data approach — computer algorithms mining data trails for collective intelligence in real time. To enthusiasts, it seemed so superior to the antiquated method of collecting health data that involved doctors talking to patients, inspecting them and filing reports.

But Google’s flu service greatly overestimated the number of cases in the United States in the 2012-13 flu season — a well-known miss — and, according to the research published this year, has persistently overstated flu cases over the years. In the Science article, the social scientists called it “big data hubris.”