Big Data’s Dangerous New Era of Discrimination


Michael Schrage in HBR blog: “Congratulations. You bought into Big Data and it’s paying off Big Time. You slice, dice, parse and process every screen-stroke, clickstream, Like, tweet and touch point that matters to your enterprise. You now know exactly who your best — and worst — customers, clients, employees and partners are.  Knowledge is power.  But what kind of power does all that knowledge buy?
Big Data creates Big Dilemmas. Greater knowledge of customers creates new potential and power to discriminate. Big Data — and its associated analytics — dramatically increase both the dimensionality and degrees of freedom for detailed discrimination. So where, in your corporate culture and strategy, does value-added personalization and segmentation end and harmful discrimination begin?
Let’s say, for example, that your segmentation data tells you the following:
Your most profitable customers by far are single women between the ages of 34 and 55 closely followed by “happily married” women with at least one child. Divorced women are slightly more profitable than “never marrieds.” Gay males — single and in relationships — are also disproportionately profitable. The “sweet spot” is urban and 28 to 50. These segments collectively account for roughly two-thirds of your profitability.  (Unexpected factoid: Your most profitable customers are overwhelmingly Amazon Prime subscriber. What might that mean?)
Going more granular, as Big Data does, offers even sharper ethno-geographic insight into customer behavior and influence:

  • Single Asian, Hispanic, and African-American women with urban post codes are most likely to complain about product and service quality to the company. Asian and Hispanic complainers happy with resolution/refund tend to be in the top quintile of profitability. African-American women do not.
  • Suburban Caucasian mothers are most likely to use social media to share their complaints, followed closely by Asian and Hispanic mothers. But if resolved early, they’ll promote the firm’s responsiveness online.
  • Gay urban males receiving special discounts and promotions are the most effective at driving traffic to your sites.

My point here is that these data are explicit, compelling and undeniable. But how should sophisticated marketers and merchandisers use them?
Campaigns, promotions and loyalty programs targeting women and gay males seem obvious. But should Asian, Hispanic and white females enjoy preferential treatment over African-American women when resolving complaints? After all, they tend to be both more profitable and measurably more willing to effectively use social media. Does it make more marketing sense encouraging African-American female customers to become more social media savvy? Or are resources better invested in getting more from one’s best customers? Similarly, how much effort and ingenuity flow should go into making more gay male customers better social media evangelists? What kinds of offers and promotions could go viral on their networks?…
Of course, the difference between price discrimination and discrimination positively correlated with gender, ethnicity, geography, class, personality and/or technological fluency is vanishingly small. Indeed, the entire epistemological underpinning of Big Data for business is that it cost-effectively makes informed segmentation and personalization possible…..
But the main source of concern won’t be privacy, per se — it will be whether and how companies and organizations like your own use Big Data analytics to justify their segmentation/personalization/discrimination strategies. The more effective Big Data analytics are in profitably segmenting and serving customers, the more likely those algorithms will be audited by regulators or litigators.
Tomorrow’s Big Data challenge isn’t technical; it’s whether managements have algorithms and analytics that are both fairly transparent and transparently fair. Big Data champions and practitioners had better be discriminating about how discriminating they want to be.”

Report “Big and open data in Europe: A growth engine or a missed opportunity?”


Press Release: “Big data and open data are not just trendy issues, they are the concern of the government institutions at the highest level. On January 29th, 2014 a Conference concerning Big & Open Data in Europe 2020 was held in the European Parliament.
Questions were asked and discussed like: Is Big & Open Data a truly transformative phenomena or just a ‘hot air’? Does it matter for Europe? How big is the economic potential of Big and Open Data for Europe till 2020? How each of the 28 Member States may benefit from it?…
The conference complemented a research project by demosEUROPA – Centre for European Strategy on Big and Open Data in Europe that aims at fostering and facilitating policy debate on the socioeconomic impact of data. The key outcome of the project, a pan-European macroeconomic study titledBig and open data In Europe: A growth engine or a missed opportunity?” carried out by the Warsaw Institute for Economic Studies (WISE) was presented.
We have the pleasure to be one of the first to present some of the findings of the report and offer the report for download.
The report analyses how technologies have the potential to influence various aspects of the European society, about their substantial, long term impact on our wealth and quality of life, but also about the new developmental challenges for the EU as a whole – as well as for its member states and their regions.
You will learn from the report:
–  the resulting economic gains of business applications of big data
– how to structure big data to move from Big Trouble to Big Value
– the costs and benefits of opening data to holders
– 3 challenges that  Europeans face with respect to big and open data
– key areas, growth opportunities and challenges for big and open data in Europe per particular regions.
The study also elaborates on the key principle of open data philosophy, which is open by default.
Europe by 2020. What will happen?
The report contains a prognosis for the 28 countries from the EU about the impact of big and open data from 2020 and its additional output and how it will affect trade, health, manufacturing, information and communication, finance & insurance and public administration in different regions. It foresees that the EU economy will grow by 1.9% by 2020 thanks to big and open data and describes the increase of the general GDP level by countries and sectors.
One of the many interesting findings of the report is that the positive impact of the data revolution will be felt more acutely in Northern Europe, while most of the New Member States and Southern European economies will benefit significantly less, with two notable exceptions being the Czech Republic and Poland. If you would like to have first-hand up-to-date information about the impact of big and open data on the future of Europe – download the report.”

Selected Readings on Big Data


The Living Library’s Selected Readings series seeks to build a knowledge base on innovative approaches for improving the effectiveness and legitimacy of governance. This curated and annotated collection of recommended works on the topic of big data was originally published in 2014.

Big Data refers to the wide-scale collection, aggregation, storage, analysis and use of data. Government is increasingly in control of a massive amount of raw data that, when analyzed and put to use, can lead to new insights on everything from public opinion to environmental concerns. The burgeoning literature on Big Data argues that it generates value by: creating transparency; enabling experimentation to discover needs, expose variability, and improve performance; segmenting populations to customize actions; replacing/supporting human decision making with automated algorithms; and innovating new business models, products and services. The insights drawn from data analysis can also be visualized in a manner that passes along relevant information, even to those without the tech savvy to understand the data on its own terms (see The GovLab Selected Readings on Data Visualization).

Selected Reading List (in alphabetical order)

Annotated Selected Reading List (in alphabetical order)

Australian Government Information Management Office. The Australian Public Service Big Data Strategy: Improved Understanding through Enhanced Data-analytics Capability Strategy Report. August 2013. http://bit.ly/17hs2xY.

  • This Big Data Strategy produced for Australian Government senior executives with responsibility for delivering services and developing policy is aimed at ingraining in government officials that the key to increasing the value of big data held by government is the effective use of analytics. Essentially, “the value of big data lies in [our] ability to extract insights and make better decisions.”
  • This positions big data as a national asset that can be used to “streamline service delivery, create opportunities for innovation, identify new service and policy approaches as well as supporting the effective delivery of existing programs across a broad range of government operations.”

Bollier, David. The Promise and Peril of Big Data. The Aspen Institute, Communications and Society Program, 2010. http://bit.ly/1a3hBIA.

  • This report captures insights from the 2009 Roundtable exploring uses of Big Data within a number of important consumer behavior and policy implication contexts.
  • The report concludes that, “Big Data presents many exciting opportunities to improve modern society. There are incalculable opportunities to make scientific research more productive, and to accelerate discovery and innovation. People can use new tools to help improve their health and well-being, and medical care can be made more efficient and effective. Government, too, has a great stake in using large databases to improve the delivery of government services and to monitor for threats to national security.
  • However, “Big Data also presents many formidable challenges to government and citizens precisely because data technologies are becoming so pervasive, intrusive and difficult to understand. How shall society protect itself against those who would misuse or abuse large databases? What new regulatory systems, private-law innovations or social practices will be capable of controlling anti-social behaviors–and how should we even define what is socially and legally acceptable when the practices enabled by Big Data are so novel and often arcane?”

Boyd, Danah and Kate Crawford. “Six Provocations for Big Data.” A Decade in Internet Time: Symposium on the Dynamics of the Internet and Society. September 2011http://bit.ly/1jJstmz.

  • In this paper, Boyd and Crawford raise challenges to unchecked assumptions and biases regarding big data. The paper makes a number of assertions about the “computational culture” of big data and pushes back against those who consider big data to be a panacea.
  • The authors’ provocations for big data are:
    • Automating Research Changes the Definition of Knowledge
    • Claims to Objectivity and Accuracy are Misleading
    • Big Data is not always Better Data
    • Not all Data is Equivalent
    • Just Because it is accessible doesn’t make it ethical
    • Limited Access to Big Data creates New Digital Divide

The Economist Intelligence Unit. Big Data and the Democratisation of Decisions. October 2012. http://bit.ly/17MpH8L.

  • This report from the Economist Intelligence Unit focuses on the positive impact of big data adoption in the private sector, but its insights can also be applied to the use of big data in governance.
  • The report argues that innovation can be spurred by democratizing access to data, allowing a diversity of stakeholders to “tap data, draw lessons and make business decisions,” which in turn helps companies and institutions respond to new trends and intelligence at varying levels of decision-making power.

Manyika, James, Michael Chui, Brad Brown, Jacques Bughin, Richard Dobbs, Charles Roxburgh, and Angela Hung Byers. Big Data: The Next Frontier for Innovation, Competition, and Productivity.  McKinsey & Company. May 2011. http://bit.ly/18Q5CSl.

  • This report argues that big data “will become a key basis of competition, underpinning new waves of productivity growth, innovation, and consumer surplus, and that “leaders in every sector will have to grapple with the implications of big data.” 
  • The report offers five broad ways in which using big data can create value:
    • First, big data can unlock significant value by making information transparent and usable at much higher frequency.
    • Second, as organizations create and store more transactional data in digital form, they can collect more accurate and detailed performance information on everything from product inventories to sick days, and therefore expose variability and boost performance.
    • Third, big data allows ever-narrower segmentation of customers and therefore much more precisely tailored products or services.
    • Fourth, big sophisticated analytics can substantially improve decision-making.
    • Finally, big data can be used to improve the development of the next generation of products and services.

The Partnership for Public Service and the IBM Center for The Business of Government. “From Data to Decisions II: Building an Analytics Culture.” October 17, 2012. https://bit.ly/2EbBTMg.

  • This report discusses strategies for better leveraging data analysis to aid decision-making. The authors argue that, “Organizations that are successful at launching or expanding analytics program…systematically examine their processes and activities to ensure that everything they do clearly connects to what they set out to achieve, and they use that examination to pinpoint weaknesses or areas for improvement.”
  • While the report features many strategies for government decisions-makers, the central recommendation is that, “leaders incorporate analytics as a way of doing business, making data-driven decisions transparent and a fundamental approach to day-to-day management. When an analytics culture is built openly, and the lessons are applied routinely and shared widely, an agency can embed valuable management practices in its DNA, to the mutual benet of the agency and the public it serves.”

TechAmerica Foundation’s Federal Big Data Commission. “Demystifying Big Data: A Practical Guide to Transforming the Business of Government.” 2013. http://bit.ly/1aalUrs.

  • This report presents key big data imperatives that government agencies must address, the challenges and the opportunities posed by the growing volume of data and the value Big Data can provide. The discussion touches on the value of big data to businesses and organizational mission, presents case study examples of big data applications, technical underpinnings and public policy applications.
  • The authors argue that new digital information, “effectively captured, managed and analyzed, has the power to change every industry including cyber security, healthcare, transportation, education, and the sciences.” To ensure that this opportunity is realized, the report proposes a detailed big data strategy framework with the following steps: define, assess, plan, execute and review.

World Economic Forum. “Big Data, Big Impact: New Possibilities for International Development.” 2012. http://bit.ly/17hrTKW.

  • This report examines the potential for channeling the “flood of data created every day by the interactions of billions of people using computers, GPS devices, cell phones, and medical devices” into “actionable information that can be used to identify needs, provide services, and predict and prevent crises for the benefit of low-income populations”
  • The report argues that, “To realise the mutual benefits of creating an environment for sharing mobile-generated data, all ecosystem actors must commit to active and open participation. Governments can take the lead in setting policy and legal frameworks that protect individuals and require contractors to make their data public. Development organisations can continue supporting governments and demonstrating both the public good and the business value that data philanthropy can deliver. And the private sector can move faster to create mechanisms for the sharing data that can benefit the public.”

Predictive Modeling With Big Data: Is Bigger Really Better?


New Paper by Junqué de Fortuny, Enric, Martens, David, and Provost, Foster in Big Data :“With the increasingly widespread collection and processing of “big data,” there is natural interest in using these data assets to improve decision making. One of the best understood ways to use data to improve decision making is via predictive analytics. An important, open question is: to what extent do larger data actually lead to better predictive models? In this article we empirically demonstrate that when predictive models are built from sparse, fine-grained data—such as data on low-level human behavior—we continue to see marginal increases in predictive performance even to very large scale. The empirical results are based on data drawn from nine different predictive modeling applications, from book reviews to banking transactions. This study provides a clear illustration that larger data indeed can be more valuable assets for predictive analytics. This implies that institutions with larger data assets—plus the skill to take advantage of them—potentially can obtain substantial competitive advantage over institutions without such access or skill. Moreover, the results suggest that it is worthwhile for companies with access to such fine-grained data, in the context of a key predictive task, to gather both more data instances and more possible data features. As an additional contribution, we introduce an implementation of the multivariate Bernoulli Naïve Bayes algorithm that can scale to massive, sparse data.”

Big Data and the Future of Privacy


John Podesta at the White House blog: “Last Friday, the President spoke to the American people, and the international community, about how to keep us safe from terrorism in a changing world while upholding America’s commitment to liberty and privacy that our values and Constitution require. Our national security challenges are real, but that is surely not the only space where changes in technology are altering the landscape and challenging conceptions of privacy.
That’s why in his speech, the President asked me to lead a comprehensive review of the way that “big data” will affect the way we live and work; the relationship between government and citizens; and how public and private sectors can spur innovation and maximize the opportunities and free flow of this information while minimizing the risks to privacy. I will be joined in this effort by Secretary of Commerce Penny Pritzker, Secretary of Energy Ernie Moniz, the President’s Science Advisor John Holdren, the President’s Economic Advisor Gene Sperling and other senior government officials.
I would like to explain a little bit more about the review, its scope, and what you can expect over the next 90 days.
We are undergoing a revolution in the way that information about our purchases, our conversations, our social networks, our movements, and even our physical identities are collected, stored, analyzed and used. The immense volume, diversity and potential value of data will have profound implications for privacy, the economy, and public policy. The working group will consider all those issues, and specifically how the present and future state of these technologies might motivate changes in our policies across a range of sectors.
When we complete our work, we expect to deliver to the President a report that anticipates future technological trends and frames the key questions that the collection, availability, and use of “big data” raise – both for our government, and the nation as a whole. It will help identify technological changes to watch, whether those technological changes are addressed by the U.S.’s current policy framework and highlight where further government action, funding, research and consideration may be required.
This is going to be a collaborative effort. The President’s Council of Advisors on Science and Technology (PCAST) will conduct a study to explore in-depth the technological dimensions of the intersection of big data and privacy, which will feed into this broader effort. Our working group will consult with industry, civil liberties groups, technologists, privacy experts, international partners, and other national and local government officials on the significance of and future for these technologies. Finally, we will be working with a number of think tanks, academic institutions, and other organizations around the country as they convene stakeholders to discuss these very issues and questions. Likewise, many abroad are analyzing and responding to the challenge and seizing the opportunity of big data. These discussions will help to inform our study.
While we don’t expect to answer all these questions, or produce a comprehensive new policy in 90 days, we expect this work to serve as the foundation for a robust and forward-looking plan of action. Check back on this blog for updates on how you can get involved in the debate and for status updates on our progress.”

Use big data and crowdsourcing to detect nuclear proliferation, says DSB


FierceGovernmentIT: “A changing set of counter-nuclear proliferation problems requires a paradigm shift in monitoring that should include big data analytics and crowdsourcing, says a report from the Defense Science Board.
Much has changed since the Cold War when it comes to ensuring that nuclear weapons are subject to international controls, meaning that monitoring in support of treaties covering declared capabilities should be only one part of overall U.S. monitoring efforts, says the board in a January report (.pdf).
There are challenges related to covert operations, such as testing calibrated to fall below detection thresholds, and non-traditional technologies that present ambiguous threat signatures. Knowledge about how to make nuclear weapons is widespread and in the hands of actors who will give the United States or its allies limited or no access….
The report recommends using a slew of technologies including radiation sensors, but also exploitation of digital sources of information.
“Data gathered from the cyber domain establishes a rich and exploitable source for determining activities of individuals, groups and organizations needed to participate in either the procurement or development of a nuclear device,” it says.
Big data analytics could be used to take advantage of the proliferation of potential data sources including commercial satellite imaging, social media and other online sources.
The report notes that the proliferation of readily available commercial satellite imagery has created concerns about the introduction of more noise than genuine signal. “On balance, however, it is the judgment from the task force that more information from remote sensing systems, both commercial and dedicated national assets, is better than less information,” it says.
In fact, the ready availability of commercial imagery should be an impetus of governmental ability to find weak signals “even within the most cluttered and noisy environments.”
Crowdsourcing also holds potential, although the report again notes that nuclear proliferation analysis by non-governmental entities “will constrain the ability of the United States to keep its options open in dealing with potential violations.” The distinction between gathering information and making political judgments “will erode.”
An effort by Georgetown University students (reported in the Washington Post in 2011) to use open source data analyzing the network of tunnels used in China to hide its missile and nuclear arsenal provides a proof-of-concept on how crowdsourcing can be used to augment limited analytical capacity, the report says – despite debate on the students’ work, which concluded that China’s arsenal could be many times larger than conventionally accepted…
For more:
download the DSB report, “Assessment of Nuclear Monitoring and Verification Technologies” (.pdf)
read the WaPo article on the Georgetown University crowdsourcing effort”

Mapping the Data Shadows of Hurricane Sandy: Uncovering the Sociospatial Dimensions of ‘Big Data’


New Paper by Shelton, T., Poorthuis, A., Graham, M., and Zook, M. : “Digital social data are now practically ubiquitous, with increasingly large and interconnected databases leading researchers, politicians, and the private sector to focus on how such ‘big data’ can allow potentially unprecedented insights into our world. This paper investigates Twitter activity in the wake of Hurricane Sandy in order to demonstrate the complex relationship between the material world and its digital representations. Through documenting the various spatial patterns of Sandy-related tweeting both within the New York metropolitan region and across the United States, we make a series of broader conceptual and methodological interventions into the nascent geographic literature on big data. Rather than focus on how these massive databases are causing necessary and irreversible shifts in the ways that knowledge is produced, we instead find it more productive to ask how small subsets of big data, especially georeferenced social media information scraped from the internet, can reveal the geographies of a range of social processes and practices. Utilizing both qualitative and quantitative methods, we can uncover broad spatial patterns within this data, as well as understand how this data reflects the lived experiences of the people creating it. We also seek to fill a conceptual lacuna in studies of user-generated geographic information, which have often avoided any explicit theorizing of sociospatial relations, by employing Jessop et al’s TPSN framework. Through these interventions, we demonstrate that any analysis of user-generated geographic information must take into account the existence of more complex spatialities than the relatively simple spatial ontology implied by latitude and longitude coordinates.”

Safety Datapalooza Shows Power of Data.gov Communities


Lisa Nelson at DigitalGov: “The White House Office of Public Engagement held the first Safety Datapalooza illustrating the power of Data.gov communities. Federal Chief Technology Officer Todd Park and Deputy Secretary of Transportation John Porcari hosted the event, which touted the data available on Safety.Data.gov and the community of innovators using it to make effective tools for consumers.
The event showcased many of the  tools that have been produced as a result of  opening this safety data including:

  • PulsePoint, from the San Ramon Fire Protection District, a lifesaving mobile app that allows CPR-trained volunteers to be notified if someone nearby is in need of emergency assistance;
  • Commute and crime maps, from Trulia, allow home buyers to choose their new residence based on two important everyday factors; and
  • Hurricane App, from the American Red Cross, to monitor storm conditions, prepare your family and home, find help, and let others know you’re safe even if the power is out;

Safety data is far from alone in generating innovative ideas and gathering a community of developers and entrepreneurs, Data.gov currently has 16 different topically diverse communities on land and sea — the Cities and Oceans communities being two such examples. Data.gov’s communities are a virtual meeting spot for interested parties across government, academia and industry to come together and put the data to use. Data.gov enables a whole set of tools to make these communities come to life: apps, blogs, challenges, forums, ranking, rating and wikis.
For a summary of the Safety Datapalooza visit Transportation’s “Fast Lane” blog.”

New Book: Open Data Now


New book by Joel Gurin (The GovLab): “Open Data is the world’s greatest free resource–unprecedented access to thousands of databases–and it is one of the most revolutionary developments since the Information Age began. Combining two major trends–the exponential growth of digital data and the emerging culture of disclosure and transparency–Open Data gives you and your business full access to information that has never been available to the average person until now. Unlike most Big Data, Open Data is transparent, accessible, and reusable in ways that give it the power to transform business, government, and society.
Open Data Now is an essential guide to understanding all kinds of open databases–business, government, science, technology, retail, social media, and more–and using those resources to your best advantage. You’ll learn how to tap crowds for fast innovation, conduct research through open collaboration, and manage and market your business in a transparent marketplace.
Open Data is open for business–and the opportunities are as big and boundless as the Internet itself. This powerful, practical book shows you how to harness the power of Open Data in a variety of applications:

  • HOT STARTUPS: turn government data into profitable ventures
  • SAVVY MARKETING: understand how reputational data drives your brand
  • DATA-DRIVEN INVESTING: apply new tools for business analysis
  • CONSUMER IN FORMATION: connect with your customers using smart disclosure
  • GREEN BUSINESS: use data to bet on sustainable companies
  • FAST R&D: turn the online world into your research lab
  • NEW OPPORTUNITIES: explore open fields for new businesses

Whether you’re a marketing professional who wants to stay on top of what’s trending, a budding entrepreneur with a billion-dollar idea and limited resources, or a struggling business owner trying to stay competitive in a changing global market–or if you just want to understand the cutting edge of information technology–Open Data Now offers a wealth of big ideas, strategies, and techniques that wouldn’t have been possible before Open Data leveled the playing field.
The revolution is here and it’s now. It’s Open Data Now.”

Why the Nate Silvers of the World Don’t Know Everything


Felix Salmon in Wired: “This shift in US intelligence mirrors a definite pattern of the past 30 years, one that we can see across fields and institutions. It’s the rise of the quants—that is, the ascent to power of people whose native tongue is numbers and algorithms and systems rather than personal relationships or human intuition. Michael Lewis’ Moneyball vividly recounts how the quants took over baseball, as statistical analy­sis trumped traditional scouting and propelled the underfunded Oakland A’s to a division-winning 2002 season. More recently we’ve seen the rise of the quants in politics. Commentators who “trusted their gut” about Mitt Romney’s chances had their gut kicked by Nate Silver, the stats whiz who called the election days before­hand as a lock for Obama, down to the very last electoral vote in the very last state.
The reason the quants win is that they’re almost always right—at least at first. They find numerical patterns or invent ingenious algorithms that increase profits or solve problems in ways that no amount of subjective experience can match. But what happens after the quants win is not always the data-driven paradise that they and their boosters expected. The more a field is run by a system, the more that system creates incentives for everyone (employees, customers, competitors) to change their behavior in perverse ways—providing more of whatever the system is designed to measure and produce, whether that actually creates any value or not. It’s a problem that can’t be solved until the quants learn a little bit from the old-fashioned ways of thinking they’ve displaced.
No matter the discipline or industry, the rise of the quants tends to happen in four stages. Stage one is what you might call pre-disruption, and it’s generally best visible in hindsight. Think about quaint dating agencies in the days before the arrival of Match .com and all the other algorithm-powered online replacements. Or think about retail in the era before floor-space management analytics helped quantify exactly which goods ought to go where. For a live example, consider Hollywood, which, for all the money it spends on market research, is still run by a small group of lavishly compensated studio executives, all of whom are well aware that the first rule of Hollywood, as memorably summed up by screenwriter William Goldman, is “Nobody knows anything.” On its face, Hollywood is ripe for quantifi­cation—there’s a huge amount of data to be mined, considering that every movie and TV show can be classified along hundreds of different axes, from stars to genre to running time, and they can all be correlated to box office receipts and other measures of profitability.
Next comes stage two, disruption. In most industries, the rise of the quants is a recent phenomenon, but in the world of finance it began back in the 1980s. The unmistakable sign of this change was hard to miss: the point at which you started getting targeted and personalized offers for credit cards and other financial services based not on the relationship you had with your local bank manager but on what the bank’s algorithms deduced about your finances and creditworthiness. Pretty soon, when you went into a branch to inquire about a loan, all they could do was punch numbers into a computer and then give you the computer’s answer.
For a present-day example of disruption, think about politics. In the 2012 election, Obama’s old-fashioned campaign operatives didn’t disappear. But they gave money and freedom to a core group of technologists in Chicago—including Harper Reed, former CTO of the Chicago-based online retailer Threadless—and allowed them to make huge decisions about fund-raising and voter targeting. Whereas earlier campaigns had tried to target segments of the population defined by geography or demographic profile, Obama’s team made the campaign granular right down to the individual level. So if a mom in Cedar Rapids was on the fence about who to vote for, or whether to vote at all, then instead of buying yet another TV ad, the Obama campaign would message one of her Facebook friends and try the much more effective personal approach…
After disruption, though, there comes at least some version of stage three: over­shoot. The most common problem is that all these new systems—metrics, algo­rithms, automated decisionmaking processes—result in humans gaming the system in rational but often unpredictable ways. Sociologist Donald T. Campbell noted this dynamic back in the ’70s, when he articulated what’s come to be known as Campbell’s law: “The more any quantitative social indicator is used for social decision-making,” he wrote, “the more subject it will be to corruption pressures and the more apt it will be to distort and corrupt the social processes it is intended to monitor.”…
Policing is a good example, as explained by Harvard sociologist Peter Moskos in his book Cop in the Hood: My Year Policing Baltimore’s Eastern District. Most cops have a pretty good idea of what they should be doing, if their goal is public safety: reducing crime, locking up kingpins, confiscating drugs. It involves foot patrols, deep investigations, and building good relations with the community. But under statistically driven regimes, individual officers have almost no incentive to actually do that stuff. Instead, they’re all too often judged on results—specifically, arrests. (Not even convictions, just arrests: If a suspect throws away his drugs while fleeing police, the police will chase and arrest him just to get the arrest, even when they know there’s no chance of a conviction.)…
It’s increasingly clear that for smart organizations, living by numbers alone simply won’t work. That’s why they arrive at stage four: synthesis—the practice of marrying quantitative insights with old-fashioned subjective experience. Nate Silver himself has written thoughtfully about examples of this in his book, The Signal and the Noise. He cites baseball, which in the post-Moneyball era adopted a “fusion approach” that leans on both statistics and scouting. Silver credits it with delivering the Boston Red Sox’s first World Series title in 86 years. Or consider weather forecasting: The National Weather Service employs meteorologists who, understanding the dynamics of weather systems, can improve forecasts by as much as 25 percent compared with computers alone. A similar synthesis holds in eco­nomic forecasting: Adding human judgment to statistical methods makes results roughly 15 percent more accurate. And it’s even true in chess: While the best computers can now easily beat the best humans, they can in turn be beaten by humans aided by computers….
That’s what a good synthesis of big data and human intuition tends to look like. As long as the humans are in control, and understand what it is they’re controlling, we’re fine. It’s when they become slaves to the numbers that trouble breaks out. So let’s celebrate the value of disruption by data—but let’s not forget that data isn’t everything.