Essay by Mimi Onuoha: “The conceptual, practical, and ethical issues surrounding “big data” and data in general begin at the very moment of data collection. Particularly when the data concern people, not enough attention is paid to the realities entangled within that significant moment and spreading out from it.
I try to do some disentangling here, through five theses around data collection — points that are worth remembering, communicating, thinking about, dwelling on, and keeping in mind, if you have anything to do with data on a daily basis (read: all of us) and want to do data responsibly.
1. Data sets are the results of their means of collection.
It’s easy to forget that the people collecting a data set, and how they choose to do it, directly determines the data set….
2. As we collect more data, we prioritize things that fit patterns of collection.
Or as Rob Kitchin and Martin Dodge say in Code/Space,“The effect of abstracting the world is that the world starts to structure itself in the image of the capta and the code.” Data emerges from a world that is increasingly software-mediated, and software thrives on abstraction. It flattens out individual variations in favor of types and models….
3. Data sets outlive the rationale for their collection.
Spotify can come up with a list of reasons why having access to users’ photos, locations, microphones, and contact lists can improve the music streaming experience. But the reasons why they decide these forms of data might be useful can be less important than the fact that they have the data itself. This is because the needs or desires influencing the decisions to collect some type of data often eventually disappear, while the data produced as a result of those decisions have the potential to live for much longer. The data are capable of shifting and changing according to specific cultural contexts and to play different roles than what they might have initially been intended for….
4. Corollary: Especially combined, data sets reveal far more than intended.
We sometimes fail to realize that data sets, both on their own and combined with others, can be used to do far more than what they were originally intended for. You can make inferences from one data set that result in conclusions in completely different realms. Facebook, by having huge amounts of data on people and their networks, could make reasonable hypotheses regarding people’s sexual orientations….
5. Data collection is a transaction that is the result of an invisible relationship.
This is a frame — connected to my first point — useful for understanding how to think about data collection on the whole:
Every data set involving people implies subjects and objects, those who collect and those who make up the collected. It is imperative to remember that on both sides we have human beings….(More)”