Flowing Data: “Accessing government data from the source is frustrating. If you’ve done it, or at least tried to, you know the pain that is oddly formatted files, search that doesn’t work, and annotation that tells you nothing about the data in front of you.
The most frustrating part of the process is knowing how useful the data could be if only it were shared more simply. Unfortunately, ease-of-use is rarely the case, and we spend more time formatting and inspecting the data than we do actually putting it to use. Shouldn’t it be the other way around?
It’s this painstaking process that draws so much ire. It’s hard not to complain.
Maybe the people in charged of these sites just don’t know what’s going on. Or maybe they’re so overwhelmed by suck that they don’t know where to start. Or they’re unknowingly infected by the that-is-how-we’ve-always-done-it bug.
Whatever it may be, I need to think out loud about how to improve these sites. Empty complaints don’t help.
I use the Centers for Disease Control and Prevention as the test subject, but most of the things covered should easily generalize to other government sites (and non-government ones too). And I choose CDC not because they’re the worst but because they publish a lot of data that is of immediate and direct use to the general public.
I approach this from the point of view of someone who uses government data, beyond pulling a single data point from a spreadsheet. I’m also going to put on my Captain Obvious hat, because what seems obvious to some is apparently a black box to others.
Provide a useable data format
Sometimes it feels like government data is available in every format except the one that data users want. The worst one was when I downloaded a 2gb file, and upon unzipping it, I discovered it was a EXE file.
Data in PDF format is a kick in the face for people looking for CSV files. There might be ways to get the data out from PDFs, but it’s still a pain when you have more than a handful of files….
Useable data format is the most important, and if there’s just one thing you change, make it this.
(Raw data is fine too)
It’s rare to find raw government data, so it’s like striking gold when it actually happens. I realize you run into issues with data privacy, quality, missing data, etc. For these data sources, I appreciate the estimates with standard errors. However, the less aggregated (the more raw) you can provide, the better.
CSV for that too, please.
Never mind the fancy sharing tools
Not all government data is wedged into PDF files, and some of it is accessible via export tools that let you subset and layout your data exactly how you want it. The problem is that in an effort to please everyone, you end up with a tool shown on the left….
Tell people where to get the data
Get the things above done, and your government data site is exponentially better than it was before, but let’s keep going.
The navigation process to get to a dataset is incredibly convoluted, which makes it hard to find data and difficult to return to it….
Show visual previews
I’m all for visualization integrated with the data search tools. It always sucks when I spend time formatting data only to find that it wasn’t worth my time. Census Reporter is a fine example of how this might work.
That said, visual tools plus an upgrade to the previously mentioned things is a big undertaking, especially if you’re going to do it right. So I’m perfectly fine if you skip this step to focus your resources on data that’s easier to use and download. Leave the visualizing and analysis to us.
Decide what’s important, archive the rest
So much cruft. So many old documents. Broken links. Create an archive and highlight what people come to your site for.
Wrapping up
There’s plenty more stuff to update, especially once you start to work with the details, but this should be a good place to start. It’s a lot easier to point out what you can do to improve government data sharing than it is to actually do it of course. There are so many people, policies, and oh yes, politics, that it can be hard to change.”