Explore our articles
View All Results
Share:

Ten Thoughts on Government Data

Blog by Santi Ruiz: “…Below are 10 lessons I’ve learned about handling government data:

  1. Administrative data has major gaps. It’s not just that we don’t collect things we should; it’s also that information a system like SEVIS should collect just isn’t in that system. While some data gaps result from human error, others are the product of data collection systems that are leaky, or that just don’t exist. We simply cannot know things one might assume we do, like which visa-holders are currently in the country, or the employer of every working international student, because the departure dates and employer addresses of working international students are only present a fraction of the time in SEVIS. The federal government doesn’t know these things either. Failing to adequately maintain records and non-mandatory both result in inconsistent record-keeping. These gaps occur on every level as we decline to write down valuable information, neglect to write down everything we’re supposed to, and fail to hold on to everything we once wrote down.
  2. When something seems off, it often is. Government datasets often have a small number of users; often a handful of civil servants in this or that agency. This means that inaccuracies can persist unnoticed for a surprisingly long time. If you encounter what seems like a major error in government data, it’s less likely to be a failure of your understanding than you might expect. In 2024, the US undercounted the number of international students by 200,000. The error went unnoticed for months until one diligent user contacted the agency responsible. The frequency of and methodology for data collection also change periodically, which leads to results that are technically correct, but also unintuitive and potentially misleading. Most quantitative disciplines rightly train students not to assume that the data is wrong until they’ve scrutinized their own work or their understanding of the data first. But if you’re working with certain kinds of government data, you should probably leap more quickly to suspect underlying data issues.
  3. If it’s a question on a form, you can find data on it. Government administrative data is commonly just collated responses to the same questionnaire. Reading the forms which feed into it can tell you what it might contain, and where to find it. Since information isn’t always collected where you might expect, learning an agency’s paperwork can save you time, too. While investigating how many H-1B visas go to former international students, and how much they earn, my colleague Jeremy happened to realize that US Citizenship and Immigration Services collects information on someone’s wages and current immigration status when they file an I-129 Petition for a Nonimmigrant Worker. He learned this by talking to someone who knows USCIS paperwork like the back of their hand: an experienced immigration lawyer. Without realizing it, his analysis wouldn’t have been nearly as rich.
  4. We’re not actually counting. Lots of government data is based on representative samples, and uses statistical methods to reach conclusions about the population at large. But that data is not produced by literally counting the population at large. This introduces various assumptions that can easily invalidate your findings if you forget to include them. The “irreversible demographic fact” claimed by politicians last year, that two million more Americans were employed than in the year prior, was the result of using data in ways the statistical agencies explicitly tell users not to. Jed Kolko describes how this statistic was actually a zero-sum accounting artifact, resulting in part from the fact that the population totals are pre-determined by the census, while nativity is not. Since the Current Population Survey measures variable immigrant and non-immigrant populations but is always scaled to match Census totals, any reduction in the reported foreign-born population will necessarily appear as an increase in the native-born population, even if it’s driven by changes in response rates rather than real departures…(More)”

Share
How to contribute:

Did you come across – or create – a compelling project/report/book/app at the leading edge of innovation in governance?

Share it with us at info@thelivinglib.org so that we can add it to the Collection!

About the Curator

Get the latest news right in your inbox

Subscribe to curated findings and actionable knowledge from The Living Library, delivered to your inbox every Friday

Related articles

Get the latest news right in your inbox

Subscribe to curated findings and actionable knowledge from The Living Library, delivered to your inbox every Friday