Who is “Public” Data Really For?


Jer Thorp at Literary Hub: “Public” is a word that has, in the last decade, become bound tightly to data. Loosely defined, any data that is available in the public domain falls into this category, but the term is most often used to describe data that might serve some kind of civic purpose: census data or environmental data or health data, along with transparency-focused data like government budgets and reports. Often sidled up to “public” is the word “open.” Although the Venn diagram between the two words has ample overlap (public data is often open, and vice versa), the word “open” typically refers to if and how the data is accessible, rather than toward what ends it might be put to use.

Both words—“public” and “open”—invite a question: For whom? Despite the efforts of Mae and Gareth, and Tom Grundner and many others, the internet as it exists is hardly a public space. Many people still find themselves excluded from full participation. Access to anything posted on a city web page or on a .gov domain is restricted by barriers of cost and technical ability. Getting this data can be particularly hard for communities that are already marginalized, and both barriers—financial and technical—can be nearly impassable in places with limited resources and literacies.

Data.gov, the United States’ “open data portal,” lists nearly 250,000 data sets, an apparent bounty of free information. Spend some time on data.gov and other portals, though, and you’ll find out that public data as it exists is messy and often confusing. Many hosted “data sets” are links to URLs that are no longer active. Trying to access data about Native American communities from the American Community Survey on data.gov brought me first to a census site with an unlabeled list of file folders. Downloading a zip file and unpacking it resulted in 64,086 cryptically named text files each containing zero kilobytes of data. As someone who has spent much of the last decade working with these kinds of data, I can tell you that this is not an uncommon experience. All too often, working with public data feels like assembling particularly complicated Ikea furniture with no tools, no instructions, and an unknown number of missing pieces.

Today’s public data serves a particular type of person and a specific type of purpose. Mostly, it supports technically adept entrepreneurs. Civic data initiatives haven’t been shy about this; on data.gov’s impact page you’ll find a kind of hall-of-fame list of companies that are “public data success stories”: Kayak, Trulia, Foursquare, LinkedIn, Realtor.com, Zillow, Zocdoc, AccuWeather, Carfax. All of these corporations have, in some fashion, built profit models around public data, often charging for access to the very information that the state touts as “accessible, discoverable, and usable.”…(More)”.