The Data Revolution


Review of Rob Kitchin’s The Data Revolution: Big Data, Open Data, Data Infrastructures & their Consequences by David Moats in Theory, Culture and Society: “…As an industry, academia is not immune to cycles of hype and fashion. Terms like ‘postmodernism’, ‘globalisation’, and ‘new media’ have each had their turn filling the top line of funding proposals. Although they are each grounded in tangible shifts, these terms become stretched and fudged to the point of becoming almost meaningless. Yet, they elicit strong, polarised reactions. For at least the past few years, ‘big data’ seems to be the buzzword, which elicits funding, as well as the ire of many in the social sciences and humanities.

Rob Kitchin’s book The Data Revolution is one of the first systematic attempts to strip back the hype surrounding our current data deluge and take stock of what is really going on. This is crucial because this hype is underpinned by very real societal change, threats to personal privacy and shifts in store for research methods. The book acts as a helpful wayfinding device in an unfamiliar terrain, which is still being reshaped, and is admirably written in a language relevant to social scientists, comprehensible to policy makers and accessible even to the less tech savvy among us.

The Data Revolution seems to present itself as the definitive account of this phenomena but in filling this role ends up adopting a somewhat diplomatic posture. Kitchin takes all the correct and reasonable stances on the matter and advocates all the right courses of action but he is not able to, in the context of this book, pursue these propositions fully. This review will attempt to tease out some of these latent potentials and how they might be pushed in future work, in particular the implications of the ‘performative’ character of both big data narratives and data infrastructures for social science research.

Kitchin’s book starts with the observation that ‘data’ is a misnomer – etymologically data should refer to phenomena in the world which can be abstracted, measured etc. as opposed to the representations and measurements themselves, which should by all rights be called ‘capta’. This is ironic because the worst offenders in what Kitchin calls “data boosterism” seem to conflate data with ‘reality’, unmooring data from its conditions of production and making relationship between the two given or natural.

As Kitchin notes, following Bowker (2005), ‘raw data’ is an oxymoron: data are not so much mined as produced and are necessarily framed technically, ethically, temporally, spatially and philosophically. This is the central thesis of the book, that data and data infrastructures are not neutral and technical but also social and political phenomena. For those at the critical end of research with data, this is a starting assumption, but one which not enough practitioners heed. Most of the book is thus an attempt to flesh out these rapidly expanding data infrastructures and their politics….

Kitchin is at his best when revealing the gap between the narratives and the reality of data analysis such as the fallacy of empiricism – the assertion that, given the granularity and completeness of big data sets and the availability of machine learning algorithms which identify patterns within data (with or without the supervision of human coders), data can “speak for themselves”. Kitchin reminds us that no data set is complete and even these out-of-the-box algorithms are underpinned by theories and assumptions in their creation, and require context specific knowledge to unpack their findings. Kitchin also rightly raises concerns about the limits of big data, that access and interoperability of data is not given and that these gaps and silences are also patterned (Twitter is biased as a sample towards middle class, white, tech savy people). Yet, this language of veracity and reliability seems to suggest that big data is being conceptualised in relation to traditional surveys, or that our population is still the nation state, when big data could helpfully force us to reimagine our analytic objects and truth conditions and more pressingly, our ethics (Rieder, 2013).

However, performativity may again complicate things. As Kitchin observes, supermarket loyalty cards do not just create data about shopping, they encourage particular sorts of shopping; when research subjects change their behaviour to cater to the metrics and surveillance apparatuses built into platforms like Facebook (Bucher, 2012), then these are no longer just data points representing the social, but partially constitutive of new forms of sociality (this is also true of other types of data as discussed by Savage (2010), but in perhaps less obvious ways). This might have implications for how we interpret data, the distribution between quantitative and qualitative approaches (Latour et al., 2012) or even more radical experiments (Wilkie et al., 2014). Kitchin is relatively cautious about proposing these sorts of possibilities, which is not the remit of the book, though it clearly leaves the door open…(More)”

Blood donors in Sweden get a text message whenever their blood saves someone’s life


Jon Stone at the Independent: “With blood donation rates in decline all over the developed world, Sweden’s blood service is enlisting new technology to help push back against shortages.

One new initiative, where donors are sent automatic text messages telling them when their blood has actually been used, has caught the public eye.

People who donate initially receive a ‘thank you’ text when they give blood, but they get another message when their blood makes it into somebody else’s veins.

“We are constantly trying to develop ways to express [donors’] importance,” Karolina Blom Wiberg, a communications manager at the Stockholm blood service told The Independent.

“We want to give them feed back on their effort, and we find this is a good way to do that.”

The service says the messages give donors more positive feedback about how they’ve helped their fellow citizens – which encourages them to donate again.

But the new policy has also been a hit on social media and has got people talking about blood donation amongst their friends….(More)”

Introducing the News Lab


Steve Grove at Google: “It’s hard to think of a more important source of information in the world than quality journalism. At its best, news communicates truth to power, keeps societies free and open, and leads to more informed decision-making by people and leaders. In the past decade, better technology and an open Internet have led to a revolution in how news is created, distributed, and consumed. And given Google’s mission to ensure quality information is accessible and useful everywhere, we want to help ensure that innovation in news leads to a more informed, more democratic world.

That’s why we’ve created the News Lab, a new effort at Google to empower innovation at the intersection of technology and media. Our mission is to collaborate with journalists and entrepreneurs to help build the future of media. And we’re tackling this in three ways: though ensuring our tools are made available to journalists around the world (and that newsrooms know how to use them); by getting helpful Google data sets in the hands of journalists everywhere; and through programs designed to build on some of the biggest opportunities that exist in the media industry today…..

Data for more insightful storytelling

There’s a revolution in data journalism happening in newsrooms today, as more data sets and more tools for analysis are allowing journalists to create insights that were never before possible. To help journalists use our data to offer a unique window to the world, last week we announced an update to our Google Trends platform. The newGoogle Trends provides journalists with deeper, broader, and real-time data, and incorporates feedback we collected from newsrooms and data journalists around the world. We’re also helping newsrooms around the world tell stories using data, with a daily feed of curated Google Trends based on the headlines of the day, and throughpartnerships with newsrooms on specific data experiments.

Another area we’ve focused our programs on is citizen reporting. Now that mobile technology allows anyone to be a reporter, we want to do our part to ensure that user-generated news content is a positive and game-changing force in media. We’re doing that with three projectsFirst Draft, the WITNESS Media Lab, and the YouTube Newswire—each of which aims to make YouTube and other open platforms more useful places for first-hand news content from citizen reporters around the world….(More)

Researcher uncovers inherent biases of big data collected from social media sites


Phys.org: “With every click, Facebook, Twitter and other social media users leave behind digital traces of themselves, information that can be used by businesses, government agencies and other groups that rely on “big data.”

But while the information derived from social network sites can shed light on social behavioral traits, some analyses based on this type of data collection are prone to bias from the get-go, according to new research by Northwestern University professor Eszter Hargittai, who heads the Web Use Project.

Since people don’t randomly join Facebook, Twitter or LinkedIn—they deliberately choose to engage —the data are potentially biased in terms of demographics, socioeconomic background or Internet skills, according to the research. This has implications for businesses, municipalities and other groups who use because it excludes certain segments of the population and could lead to unwarranted or faulty conclusions, Hargittai said.

The study, “Is Bigger Always Better? Potential Biases of Big Data Derived from Social Network Sites” was published last month in the journal The Annals of the American Academy of Political and Social Science and is part of a larger, ongoing study.

The buzzword “big data” refers to automatically generated information about people’s behavior. It’s called “big” because it can easily include millions of observations if not more. In contrast to surveys, which require explicit responses to questions, big data is created when people do things using a service or system.

“The problem is that the only people whose behaviors and opinions are represented are those who decided to join the site in the first place,” said Hargittai, the April McClain-Delaney and John Delaney Professor in the School of Communication. “If people are analyzing big data to answer certain questions, they may be leaving out entire groups of people and their voices.”

For example, a city could use Twitter to collect local opinion regarding how to make the community more “age-friendly” or whether more bike lanes are needed. In those cases, “it’s really important to know that people aren’t on Twitter randomly, and you would only get a certain type of person’s response to the question,” said Hargittai.

“You could be missing half the population, if not more. The same holds true for companies who only use Twitter and Facebook and are looking for feedback about their products,” she said. “It really has implications for every kind of group.”…

More information: “Is Bigger Always Better? Potential Biases of Big Data Derived from Social Network Sites” The Annals of the American Academy of Political and Social Science May 2015 659: 63-76, DOI: 10.1177/0002716215570866

Transforming Government Information


Sharyn Clarkson at the (Interim) Digital Transformation Office (Australia): “Our challenge: How do we get the right information and services to people when and where they need it?

The public relies on Government for a broad range of information – advice for individuals and businesses, what services are available and how to access them, and how various rules and laws impact our lives.

The government’s digital environment has grown organically over the last couple of decades. At the moment, information is largely created and managed within agencies and published across more than 1200 disparate gov.au websites, plus a range of social media accounts, apps and other digital formats.

This creates some difficulties for people looking for government information. By publishing within agency silos we are presenting people with an agency-centric view of government information. This is a problem because people largely don’t understand or care about how government organises itself and the structure of government does not map to the needs of people. Having a baby or travelling overseas? Up to a dozen government agencies may have information relevant to you. And as people’s needs span more than one agency, they end up with a disjointed and confusing user experience as they have to navigate across disparate government sites. And even if you begin at your favourite search engine how do you know which of the many government search results is the right place to start?

There are two government entry points already in place to help users – Australia.gov.au and business.gov.au – but they largely act as an umbrella across the 1200+ sites and currently only provide a very thin layer of whole of government information and mainly refer people off to other websites.

The establishment of the DTO has provided the first opportunity for people to come together and better understand how our underlying structural landscape is impacting people’s experience with government. It’s also given us an opportunity to take a step back and ask some of the big questions about how we manage information and what problems can only really be solved through whole of government transformation.

How do we make information and services easier to find? How do we make sure we provide information that people can trust and rely upon at times of need? How should the gov.au landscape be organised to make it easier for us to meet user’s needs and expectations? How many websites should we have – assuming 1200 is too many? What makes up a better user experience – does it mean all sites should look and feel the same? How can we provide government information at the places people naturally go looking for assistance – even if these are not government sites?

As we asked these questions we started to come across some central ideas:

  • What if we could decouple the authoring and management of information from the publishing process, so the subject experts in government still manage their content but we have flexibility to present it in more user-centric ways?
  • What if we unleashed government information? Making it possible for state and local governments, non-profit groups and businesses to deliver content and services alongside their own information to give better value users.
  • Should we move the bureaucratic content (information about agencies and how they are managed such as annual reports, budget statements and operating rules) out of the way of core content and services for people? Can we simplify our environment and base it around topics and life events instead of agencies? What if we had people in government responsible for curating these topics and life events across agencies and creating simpler pathways for users?…(More)”

Please, Corporations, Experiment on Us


Michelle N. Meyer and Christopher Chabris in the New York Times: ” Can it ever be ethical for companies or governments to experiment on their employees, customers or citizens without their consent?

The conventional answer — of course not! — animated public outrage last year after Facebook published a study in which it manipulated how much emotional content more than half a million of its users saw. Similar indignation followed the revelation by the dating site OkCupid that, as an experiment, it briefly told some pairs of users that they were good matches when its algorithm had predicted otherwise.

But this outrage is misguided. Indeed, we believe that it is based on a kind of moral illusion.

Companies — and other powerful actors, including lawmakers, educators and doctors — “experiment” on us without our consent every time they implement a new policy, practice or product without knowing its consequences. When Facebook started, it created a radical new way for people to share emotionally laden information, with unknown effects on their moods. And when OkCupid started, it advised users to go on dates based on an algorithm without knowing whether it worked.

Why does one “experiment” (i.e., introducing a new product) fail to raise ethical concerns, whereas a true scientific experiment (i.e., introducing a variation of the product to determine the comparative safety or efficacy of the original) sets off ethical alarms?

In a forthcoming article in the Colorado Technology Law Journal, one of us (Professor Meyer) calls this the “A/B illusion” — the human tendency to focus on the risk, uncertainty and power asymmetries of running a test that compares A to B, while ignoring those factors when A is simply imposed by itself.

Consider a hypothetical example. A chief executive is concerned that her employees are taking insufficient advantage of the company’s policy of matching contributions to retirement savings accounts. She suspects that telling her workers how many others their age are making the maximum contribution would nudge them to save more, so she includes this information in personalized letters to them.

If contributions go up, maybe the new policy worked. But perhaps contributions would have gone up anyhow (say, because of an improving economy). If contributions go down, it might be because the policy failed. Or perhaps a declining economy is to blame, and contributions would have gone down even more without the letter.

You can’t answer these questions without doing a true scientific experiment — in technology jargon, an “A/B test.” The company could randomly assign its employees to receive either the old enrollment packet or the new one that includes the peer contribution information, and then statistically compare the two groups of employees to see which saved more.

Let’s be clear: This is experimenting on people without their consent, and the absence of consent is essential to the validity of the entire endeavor. If the C.E.O. were to tell the workers that they had been randomly assigned to receive one of two different letters, and why, that information would be likely to distort their choices.

Our chief executive isn’t so hypothetical. Economists do help corporations run such experiments, but many managers chafe at debriefing their employees afterward, fearing that they will be outraged that they were experimented on without their consent. A company’s unwillingness to debrief, in turn, can be a deal-breaker for the ethics boards that authorize research. So those C.E.O.s do what powerful people usually do: Pick the policy that their intuition tells them will work best, and apply it to everyone….(More)”

Forging Trust Communities: How Technology Changes Politics


Book by Irene S. Wu: “Bloggers in India used social media and wikis to broadcast news and bring humanitarian aid to tsunami victims in South Asia. Terrorist groups like ISIS pour out messages and recruit new members on websites. The Internet is the new public square, bringing to politics a platform on which to create community at both the grassroots and bureaucratic level. Drawing on historical and contemporary case studies from more than ten countries, Irene S. Wu’s Forging Trust Communities argues that the Internet, and the technologies that predate it, catalyze political change by creating new opportunities for cooperation. The Internet does not simply enable faster and easier communication, but makes it possible for people around the world to interact closely, reciprocate favors, and build trust. The information and ideas exchanged by members of these cooperative communities become key sources of political power akin to military might and economic strength.

Wu illustrates the rich world history of citizens and leaders exercising political power through communications technology. People in nineteenth-century China, for example, used the telegraph and newspapers to mobilize against the emperor. In 1970, Taiwanese cable television gave voice to a political opposition demanding democracy. Both Qatar (in the 1990s) and Great Britain (in the 1930s) relied on public broadcasters to enhance their influence abroad. Additional case studies from Brazil, Egypt, the United States, Russia, India, the Philippines, and Tunisia reveal how various technologies function to create new political energy, enabling activists to challenge institutions while allowing governments to increase their power at home and abroad.

Forging Trust Communities demonstrates that the way people receive and share information through network communities reveals as much about their political identity as their socioeconomic class, ethnicity, or religion. Scholars and students in political science, public administration, international studies, sociology, and the history of science and technology will find this to be an insightful and indispensable work…(More)”

Harnessing the Crowd to Solve Healthcare


PSFK Labs: “While being sick is never a good situation to be in, the majority of people can still take solace in the fact that modern medicine will be able to diagnose their problem and get them on the path to a quick recovery. For a small percentage of patients, however, simply finding out what ails them can be a challenge. Despite countless visits to specialists and mounting costs, these individuals can struggle for years to find out any reliable information about their illness.

This is only exacerbated by the fact that in a heavily regulated industry like healthcare, words like “personalization,” “transparency” and “collaboration” are near impossibilities, leaving these patients locked into a system that can’t care for them. Enter CrowdMed, an online platform that uses the combined knowledge of its community to overcome these obstacles, getting people the answers and treatment they need.

…we spoke with Jared Heyman, the company’s founder, to understand how the crowd can deliver unprecedented efficiencies to a system sorely in need of them…. “CrowdMed harnesses the wisdom of crowds to solve the world’s most difficult medical cases online. Let’s say that you’ve been bouncing doctor to doctor, but don’t yet have a definitive diagnosis or treatment plan. You can submit your case on our site by answering an in‑depth patient questionnaire, uploading relevant medical records, diagnostic test results or even medical images. We expose your case to our community of currently over 15,000 medical detectives. These are people mostly with medical backgrounds who enjoy solving these challenges.

We have about a 70 percent success rate, bringing patients closer to a direct diagnosis or cure and we do so in a very small fraction of the time and cost of what it would take through the traditional medical system….

Every entrepreneur builds upon the tools and technologies that preceded them. I think that CrowdMed needed the Internet. It needed Facebook. It needed Wikipedia. It needed Quora, and other companies or products that have proven that you can trust in the wisdom of the crowd. I think we’re built upon the shoulders of these other companies.

We looked at all these other companies that have proven the value of social networks through crowdsourcing, and that’s inspired us to do what we do. It’s been instructive for us in the best way to do it, and it’s also prepared society, psychologically and culturally, for what we’re doing. All these things were important….(More)”

Beating the news’ with EMBERS: Forecasting Civil Unrest using Open Source Indicators


Paper by Naren Ramakrishnan et al: “We describe the design, implementation, and evaluation of EMBERS, an automated, 24×7 continuous system for forecasting civil unrest across 10 countries of Latin America using open source indicators such as tweets, news sources, blogs, economic indicators, and other data sources. Unlike retrospective studies, EMBERS has been making forecasts into the future since Nov 2012 which have been (and continue to be) evaluated by an independent T&E team (MITRE). Of note, EMBERS has successfully forecast the uptick and downtick of incidents during the June 2013 protests in Brazil. We outline the system architecture of EMBERS, individual models that leverage specific data sources, and a fusion and suppression engine that supports trading off specific evaluation criteria. EMBERS also provides an audit trail interface that enables the investigation of why specific predictions were made along with the data utilized for forecasting. Through numerous evaluations, we demonstrate the superiority of EMBERS over baserate methods and its capability to forecast significant societal happenings….(More)”