The Use of Data Science in a National Statistical Office

Paper by  Sevgui Erman, Eric Rancourt, Yanick Beaucage, and Andre Loranger: “Objective statistical information is vital to an open and democratic society. It provides a solid foundation so that informed decisions can be made by our elected representatives, businesses, unions, and non-profit organizations, as well as individual citizens. There is a great shift towards a more virtual and digital economy and society. The traditional official statistical systems are centered on surveys, and must be adapted to this new digital reality. National statistical offices have been increasingly embracing non-survey data sources along with data science methods to better serve society.

This paper provides a blueprint for the application of data science in a government organization. It describes how data science enables innovation and the delivery of new high-value, high-quality, relevant, and trusted products that reflect the ever-evolving needs of our society and economy. We discuss practical operational considerations and impactful data science applications that supported the work of Statistics Canada’s analysts and front-line health agencies during the pandemic. We also discuss the innovative use of scanner data in lieu of survey data for large business respondents in the retail industry. We will describe computer vision methodologies, including machine learning models used to detect the start of buildings construction from satellite imagery, greenhouse area and greenhouse production, as well as crop types detection. Data science and machine learning methods have tremendous potential, and their ethical use is of primary importance. We conclude the paper with a forward-facing view of responsible data science use in statistical production.