/naʊˈkæstɪŋ/
A method of describing the present or the near future by analyzing datasets that are not traditionally included in the analysis (e.g. web searches, reviews, social media data, etc.)
Nowcasting is a term that originates in meteorology, which refers to “the detailed description of the current weather along with forecasts obtained by extrapolation for a period of 0 to 6 hours ahead.” Today, nowcasting is also used in other fields, such as macroeconomics and health, to provide more up-to-date statistics.
Traditionally, macroeconomic statistics are collected on a quarterly basis and released with a substantial lag. For example, GDP data for the euro area “is only available at quarterly frequency and is released six weeks after the close of the quarter.” Further, economic datasets from government agencies such as the US Census Bureau “typically appear only after multi-year lags, and the public-facing versions are aggregated to the county or ZIP code level.”
The arrival of the big data era has shown some promise to improve nowcasting. A paper by Edward L. Glaeser, Hyunjin Kim, and Michael Luca presents “evidence that Yelp data can complement government surveys by measuring economic activity in close to real-time, at a granular level, and at almost any geographic scale.” In the paper, the authors concluded:
“Our analyses of one possible data source, Yelp, suggests that these new data sources can be a useful complement to official government data. Yelp can help predict contemporaneous changes in the local economy. It can also provide a snapshot of economic change at the local level. It is a useful addition to the data tools that local policy-makers can access.
“Yet our analysis also highlights the challenges with the idea of replacing the Census altogether at any point in the near future. Government statistical agencies invest heavily in developing relatively complete coverage, for a wide set of metrics. The variation in coverage inherent in data from online platforms make it difficult to replace the role of providing official statistics that government data sources play.
“Ultimately, data from platforms like Yelp –combined with official government statistics – can provide valuable complementary datasets that will ultimately allow for more timely and granular forecasts and policy analyses, with a wider set of variables and more complete view of the local economy.”
Another example comes from the United States Federal Reserve (The Fed), which used data from payroll-processing company ADP to payroll employment. This data is traditionally provided by Current Employment Statistics (CES) survey. Despite being “one of the most carefully conducted measures of labor market activity and uses an extremely large sample, it is still subject to significant sampling error and nonsampling errors.” The Fed sought to improve the reliability of this survey by including data provided by ADP. The study found that combining CES and ADP data “reduces the error inherent in both data sources.”
However, nowcasting using big data comes with some limitations. Several researchers evaluated the accuracy of Google Flu Trends (GFT) in the 2012-2013 and 2013-2014 seasons. GFT uses flu-related google searches to make its prediction. The study found that GFT data showed significant overestimation compared to Centers for Disease Control and Prevention (CDC) flu trends prediction.
Jesse Dunietz wrote in Nautilus describing how to address the limitations of big data and make nowcasting efforts more accurate:
“But when big data isn’t seen as a panacea, it can be transformative. Several groups, like Columbia University researcher Jeffrey Shaman’s, for example, have outperformed the flu predictions of both the CDC and GFT by using the former to compensate for the skew of the latter. “Shaman’s team tested their model against actual flu activity that had already occurred during the season,” according to the CDC. By taking the immediate past into consideration, Shaman and his team fine-tuned their mathematical model to better predict the future. All it takes is for teams to critically assess their assumptions about their data.”