Why the Nate Silvers of the World Don’t Know Everything

Felix Salmon in Wired: “This shift in US intelligence mirrors a definite pattern of the past 30 years, one that we can see across fields and institutions. It’s the rise of the quants—that is, the ascent to power of people whose native tongue is numbers and algorithms and systems rather than personal relationships or human intuition. Michael Lewis’ Moneyball vividly recounts how the quants took over baseball, as statistical analy­sis trumped traditional scouting and propelled the underfunded Oakland A’s to a division-winning 2002 season. More recently we’ve seen the rise of the quants in politics. Commentators who “trusted their gut” about Mitt Romney’s chances had their gut kicked by Nate Silver, the stats whiz who called the election days before­hand as a lock for Obama, down to the very last electoral vote in the very last state.
The reason the quants win is that they’re almost always right—at least at first. They find numerical patterns or invent ingenious algorithms that increase profits or solve problems in ways that no amount of subjective experience can match. But what happens after the quants win is not always the data-driven paradise that they and their boosters expected. The more a field is run by a system, the more that system creates incentives for everyone (employees, customers, competitors) to change their behavior in perverse ways—providing more of whatever the system is designed to measure and produce, whether that actually creates any value or not. It’s a problem that can’t be solved until the quants learn a little bit from the old-fashioned ways of thinking they’ve displaced.
No matter the discipline or industry, the rise of the quants tends to happen in four stages. Stage one is what you might call pre-disruption, and it’s generally best visible in hindsight. Think about quaint dating agencies in the days before the arrival of Match .com and all the other algorithm-powered online replacements. Or think about retail in the era before floor-space management analytics helped quantify exactly which goods ought to go where. For a live example, consider Hollywood, which, for all the money it spends on market research, is still run by a small group of lavishly compensated studio executives, all of whom are well aware that the first rule of Hollywood, as memorably summed up by screenwriter William Goldman, is “Nobody knows anything.” On its face, Hollywood is ripe for quantifi­cation—there’s a huge amount of data to be mined, considering that every movie and TV show can be classified along hundreds of different axes, from stars to genre to running time, and they can all be correlated to box office receipts and other measures of profitability.
Next comes stage two, disruption. In most industries, the rise of the quants is a recent phenomenon, but in the world of finance it began back in the 1980s. The unmistakable sign of this change was hard to miss: the point at which you started getting targeted and personalized offers for credit cards and other financial services based not on the relationship you had with your local bank manager but on what the bank’s algorithms deduced about your finances and creditworthiness. Pretty soon, when you went into a branch to inquire about a loan, all they could do was punch numbers into a computer and then give you the computer’s answer.
For a present-day example of disruption, think about politics. In the 2012 election, Obama’s old-fashioned campaign operatives didn’t disappear. But they gave money and freedom to a core group of technologists in Chicago—including Harper Reed, former CTO of the Chicago-based online retailer Threadless—and allowed them to make huge decisions about fund-raising and voter targeting. Whereas earlier campaigns had tried to target segments of the population defined by geography or demographic profile, Obama’s team made the campaign granular right down to the individual level. So if a mom in Cedar Rapids was on the fence about who to vote for, or whether to vote at all, then instead of buying yet another TV ad, the Obama campaign would message one of her Facebook friends and try the much more effective personal approach…
After disruption, though, there comes at least some version of stage three: over­shoot. The most common problem is that all these new systems—metrics, algo­rithms, automated decisionmaking processes—result in humans gaming the system in rational but often unpredictable ways. Sociologist Donald T. Campbell noted this dynamic back in the ’70s, when he articulated what’s come to be known as Campbell’s law: “The more any quantitative social indicator is used for social decision-making,” he wrote, “the more subject it will be to corruption pressures and the more apt it will be to distort and corrupt the social processes it is intended to monitor.”…
Policing is a good example, as explained by Harvard sociologist Peter Moskos in his book Cop in the Hood: My Year Policing Baltimore’s Eastern District. Most cops have a pretty good idea of what they should be doing, if their goal is public safety: reducing crime, locking up kingpins, confiscating drugs. It involves foot patrols, deep investigations, and building good relations with the community. But under statistically driven regimes, individual officers have almost no incentive to actually do that stuff. Instead, they’re all too often judged on results—specifically, arrests. (Not even convictions, just arrests: If a suspect throws away his drugs while fleeing police, the police will chase and arrest him just to get the arrest, even when they know there’s no chance of a conviction.)…
It’s increasingly clear that for smart organizations, living by numbers alone simply won’t work. That’s why they arrive at stage four: synthesis—the practice of marrying quantitative insights with old-fashioned subjective experience. Nate Silver himself has written thoughtfully about examples of this in his book, The Signal and the Noise. He cites baseball, which in the post-Moneyball era adopted a “fusion approach” that leans on both statistics and scouting. Silver credits it with delivering the Boston Red Sox’s first World Series title in 86 years. Or consider weather forecasting: The National Weather Service employs meteorologists who, understanding the dynamics of weather systems, can improve forecasts by as much as 25 percent compared with computers alone. A similar synthesis holds in eco­nomic forecasting: Adding human judgment to statistical methods makes results roughly 15 percent more accurate. And it’s even true in chess: While the best computers can now easily beat the best humans, they can in turn be beaten by humans aided by computers….
That’s what a good synthesis of big data and human intuition tends to look like. As long as the humans are in control, and understand what it is they’re controlling, we’re fine. It’s when they become slaves to the numbers that trouble breaks out. So let’s celebrate the value of disruption by data—but let’s not forget that data isn’t everything.

From Faith-Based to Evidence-Based: The Open Data 500 and Understanding How Open Data Helps the American Economy

Beth Noveck in Forbes: “Public funds have, after all, paid for their collection, and the law says that federal government data are not protected by copyright. By the end of 2009, the US and the UK had the only two open data one-stop websites where agencies could post and citizens could find open data. Now there are over 300 such portals for government data around the world with over 1 million available datasets. This kind of Open Data — including weather, safety and public health information as well as information about government spending — can serve the country by increasing government efficiency, shedding light on regulated industries, and driving innovation and job creation.

It’s becoming clear that open data has the potential to improve people’s lives. With huge advances in data science, we can take this data and turn it into tools that help people choose a safer hospital, pick a better place to live, improve the performance of their farm or business by having better climate models, and know more about the companies with whom they are doing business. Done right, people can even contribute data back, giving everyone a better understanding, for example of nuclear contamination in post-Fukushima Japan or incidences of price gouging in America’s inner cities.

The promise of open data is limitless. (see the GovLab index for stats on open data) But it’s important to back up our faith with real evidence of what works. Last September the GovLab began the Open Data 500 project, funded by the John S. and James L. Knight Foundation, to study the economic value of government Open Data extensively and rigorously.  A recent McKinsey study pegged the annual global value of Open Data (including free data from sources other than government), at $3 trillion a year or more. We’re digging in and talking to those companies that use Open Data as a key part of their business model. We want to understand whether and how open data is contributing to the creation of new jobs, the development of scientific and other innovations, and adding to the economy. We also want to know what government can do better to help industries that want high quality, reliable, up-to-date information that government can supply. Of those 1 million datasets, for example, 96% are not updated on a regular basis.

The GovLab just published an initial working list of 500 American companies that we believe to be using open government data extensively.  We’ve also posted in-depth profiles of 50 of them — a sample of the kind of information that will be available when the first annual Open Data 500 study is published in early 2014. We are also starting a similar study for the UK and Europe.

Even at this early stage, we are learning that Open Data is a valuable resource. As my colleague Joel Gurin, author of Open Data Now: the Secret to Hot Start-Ups, Smart Investing, Savvy Marketing and Fast Innovation, who directs the project, put it, “Open Data is a versatile and powerful economic driver in the U.S. for new and existing businesses around the country, in a variety of ways, and across many sectors. The diversity of these companies in the kinds of data they use, the way they use it, their locations, and their business models is one of the most striking things about our findings so far.” Companies are paradoxically building value-added businesses on top of public data that anyone can access for free….”

FULL article can be found here.

Entrepreneurs Shape Free Data Into Money

Angus Loten in the Wall Street Journal: “More cities are putting information on everything from street-cleaning schedules to police-response times and restaurant inspection reports in the public domain, in the hope that people will find a way to make money off the data.
Supporters of such programs often see them as a local economic stimulus plan, allowing software developers and entrepreneurs in cities ranging from San Francisco to South Bend, Ind., to New York, to build new businesses based on the information they get from government websites.
When Los Angeles Mayor Eric Garcetti issued an executive directive last month to launch the city’s open-data program, he cited entrepreneurs and businesses as important beneficiaries. Open-data promotes innovation and “gives companies, individuals, and nonprofit organizations the opportunity to leverage one of government’s greatest assets: public information,” according to the Dec. 18 directive.
A poster child for the movement might be 34-year-old Matt Ehrlichman of Seattle, who last year built an online business in part using Seattle work permits, professional licenses and other home-construction information gathered up by the city’s Department of Planning and Development.
While his website is free, his business, called Porch.com, has more than 80 employees and charges a $35 monthly fee to industry professionals who want to boost the visibility of their projects on the site.
The site gathers raw public data—such as addresses for homes under renovation, what they are doing, who is doing the work and how much they are charging—and combines it with photos and other information from industry professionals and homeowners. It then creates a searchable database for users to compare ideas and costs for projects near their own neighborhood.
…Ian Kalin, director of open-data services at Socrata, a Seattle-based software firm that makes the back-end applications for many of these government open-data sites, says he’s worked with hundreds of companies that were formed around open data.
Among them is Climate Corp., a San Francisco-based firm that collects weather and yield-forecasting data to help farmers decide when and where to plant crops. Launched in 2006, the firm was acquired in October by Monsanto Co. MON -2.90% , the seed-company giant, for $930 million.
Overall, the rate of new business formation declined nationally between 2006 and 2010. But according to the latest data from the Ewing Marion Kauffman Foundation, an entrepreneurship advocacy group in Kansas City, Mo., the rate of new business formation in Seattle in 2011 rose 9.41% in 2011, compared with the national average of 3.9%.
Other cities where new business formation was ahead of the national average include Chicago, Austin, Texas, Baltimore, and South Bend, Ind.—all cities that also have open-data programs. Still, how effective the ventures are in creating jobs is difficult to gauge.
One wrinkle: privacy concerns about the potential for information—such as property tax and foreclosure data—to be misused.
Some privacy advocates fear that government data that include names, addresses and other sensitive information could be used by fraudsters to target victims.”

The Emergence Of The Connected City

Glen Martin at Forbes: “If the modern city is a symbol for randomness — even chaos — the city of the near future is shaping up along opposite metaphorical lines. The urban environment is evolving rapidly, and a model is emerging that is more efficient, more functional, more — connected, in a word.
This will affect how we work, commute, and spend our leisure time. It may well influence how we relate to one another, and how we think about the world. Certainly, our lives will be augmented: better public transportation systems, quicker responses from police and fire services, more efficient energy consumption. But there could also be dystopian impacts: dwindling privacy and imperiled personal data. We could even lose some of the ferment that makes large cities such compelling places to live; chaos is stressful, but it can also be stimulating.
It will come as no surprise that converging digital technologies are driving cities toward connectedness. When conjoined, ISM band transmitters, sensors, and smart phone apps form networks that can make cities pretty darn smart — and maybe more hygienic. This latter possibility, at least, is proposed by Samrat Saha of the DCI Marketing Group in Milwaukee. Saha suggests “crowdsourcing” municipal trash pick-up via BLE modules, proximity sensors and custom mobile device apps.
“My idea is a bit tongue in cheek, but I think it shows how we can gain real efficiencies in urban settings by gathering information and relaying it via the Cloud,” Saha says. “First, you deploy sensors in garbage cans. Each can provides a rough estimate of its fill level and communicates that to a BLE 112 Module.”
As pedestrians who have downloaded custom “garbage can” apps on their BLE-capable iPhone or Android devices pass by, continues Saha, the information is collected from the module and relayed to a Cloud-hosted service for action — garbage pick-up for brimming cans, in other words. The process will also allow planners to optimize trash can placement, redeploying receptacles from areas where need is minimal to more garbage-rich environs….
Garbage can connectivity has larger implications than just, well, garbage. Brett Goldstein, the former Chief Data and Information Officer for the City of Chicago and a current lecturer at the University of Chicago, says city officials found clear patterns between damaged or missing garbage cans and rat problems.
“We found areas that showed an abnormal increase in missing or broken receptacles started getting rat outbreaks around seven days later,” Goldstein said. “That’s very valuable information. If you have sensors on enough garbage cans, you could get a temporal leading edge, allowing a response before there’s a problem. In urban planning, you want to emphasize prevention, not reaction.”
Such Cloud-based app-centric systems aren’t suited only for trash receptacles, of course. Companies such as Johnson Controls are now marketing apps for smart buildings — the base component for smart cities. (Johnson’s Metasys management system, for example, feeds data to its app-based Paoptix Platform to maximize energy efficiency in buildings.) In short, instrumented cities already are emerging. Smart nodes — including augmented buildings, utilities and public service systems — are establishing connections with one another, like axon-linked neurons.
But Goldstein, who was best known in Chicago for putting tremendous quantities of the city’s data online for public access, emphasizes instrumented cities are still in their infancy, and that their successful development will depend on how well we “parent” them.
“I hesitate to refer to ‘Big Data,’ because I think it’s a terribly overused term,” Goldstein said. “But the fact remains that we can now capture huge amounts of urban data. So, to me, the biggest challenge is transitioning the fields — merging public policy with computer science into functional networks.”…”

What Jelly Means

Steven Johnson: “A few months ago, I found this strange white mold growing in my garden in California. I’m a novice gardener, and to make matters worse, a novice Californian, so I had no idea what these small white cells might portend for my flowers.
This is one of those odd blank spots — I used the call them Googleholes in the early days of the service — where the usual Delphic source of all knowledge comes up relatively useless. The Google algorithm doesn’t know what those white spots are, the way it knows more computational questions, like “what is the top-ranked page for “white mold?” or “what is the capital of Illinois?” What I want, in this situation, is the distinction we usually draw between information and wisdom. I don’t just want to know what the white spots are; I want to know if I should be worried about them, or if they’re just a normal thing during late summer in Northern California gardens.
Now, I’m sure I know a dozen people who would be able to answer this question, but the problem is I don’t really know which people they are. But someone in my extended social network has likely experienced these white spots on their plants, or better yet, gotten rid of them.  (Or, for all I know, ate them — I’m trying not to be judgmental.) There are tools out there that would help me run the social search required to find that person. I can just bulk email my entire address book with images of the mold and ask for help. I could go on Quora, or a gardening site.
But the thing is, it’s a type of question that I find myself wanting to ask a lot, and there’s something inefficient about trying to figure the exact right tool to use to ask it each time, particularly when we have seen the value of consolidating so many of our queries into a single, predictable search field at Google.
This is why I am so excited about the new app, Jelly, which launched today. …
Jelly, if you haven’t heard, is the brainchild of Biz Stone, one of Twitter’s co-founders.  The service launches today with apps on iOS and Android. (Biz himself has a blog post and video, which you should check out.) I’ve known Biz since the early days of Twitter, and I’m excited to be an adviser and small investor in a company that shares so many of the values around networks and collective intelligence that I’ve been writing about since Emergence.
The thing that’s most surprising about Jelly is how fun it is to answer questions. There’s something strangely satisfying in flipping through the cards, reading questions, scanning the pictures, and looking for a place to be helpful. It’s the same broad gesture of reading, say, a Twitter feed, and pleasantly addictive in the same way, but the intent is so different. Scanning a twitter feed while waiting for the train has the feel of “Here we are now, entertain us.” Scanning Jelly is more like: “I’m here. How can I help?”

Social media in crisis events: Open networks and collaboration supporting disaster response and recovery

Paper for the IEEE International Conference on Technologies for Homeland Security (HST): “Large-scale crises challenge the ability of public safety and security organisations to respond efficient and effectively. Meanwhile, citizens’ adoption of mobile technology and rich social media services is dramatically changing the way crisis responses develop. Empowered by new communication media (smartphones, text messaging, internet-based applications and social media), citizens are the in situ first sensors. However, this entire social media arena is unchartered territory to most public safety and security organisations. In this paper, we analyse crisis events to draw narratives on social media relevance and describe how public safety and security organisations are increasingly aware of social media’s added value proposition in times of crisis. A set of critical success indicators to address the process of adopting social media is identified, so that social media information is rapidly transformed into actionable intelligence, thus enhancing the effectiveness of public safety and security organisations — saving time, money and lives.”

Open Government Strategy Continues with US Currency Production API

Eric Carter in the ProgrammableWeb: “Last year, the Executive branch of the US government made huge strides in opening up government controlled data to the developer community. Projects such as the Open Data Policy and the Machine Readable Executive Order have led the US government to develop an API strategy. Today, ProgrammableWeb takes a look at another open government API: the Annual Production Figures of United States Currency API.

The US Treasury’s Bureau of Engraving and Printing (BEP) provides the dataset available through the Production Figures API. The data available consists of the number of $1, $5, $10, $20, $50, $100 notes printed each year from 1980 to 2012. With this straightforward, seemingly basic set of data available, the question becomes: “Why is this data useful“? To answer this, one should consider the purpose of the Executive Order:

“Openness in government strengthens our democracy, promotes the delivery of efficient and effective services to the public, and contributes to economic growth. As one vital benefit of open government, making information resources easy to find, accessible, and usable can fuel entrepreneurship, innovation, and scientific discovery that improves Americans’ lives and contributes significantly to job creation.”

The API uses HTTP and can return requests in XML, JSON, or CSV data formats. As stated, the API retrieves the number of bills of a designated currency for the desired year. For more information and code samples, visit the API docs.”

Introduction to Linked Open Data (LOD)

Paper by Ivan Herman, presented at the International Conference on Dublin Core and Metadata Applications 2013: “The goal of the tutorial is to introduce the audience into the basics of the technologies used for Linked Data. This includes RDF, RDFS, main elements of SPARQL, SKOS, and OWL. Some general guidelines on publishing data as Linked Data will also be provided, as well as real-life usage examples of the various technologies.”

Full Text: PDF (Description)  |  PDF (Presentation)

A permanent hacker space in the Brazilian Congress

Blog entry by Dan Swislow at OpeningParliament: “On December 17, the presidency of the Brazilian Chamber of Deputies passed a resolution that creates a permanent Laboratório Ráquer or “Hacker Lab” inside the Chamber—a global first.
Read the full text of the resolution in Portuguese.
The resolution mandates the creation of a physical space at the Chamber that is “open for access and use by any citizen, especially programmers and software developers, members of parliament and other public workers, where they can utilize public data in a collaborative fashion for actions that enhance citizenship.”
The idea was born out of a week-long, hackathon (or “hacker marathon”) event hosted by the Chamber of Deputies in November, with the goal of using technology to enhance the transparency of legislative work and increase citizen understanding of the legislative process. More than 40 software developers and designers worked to create 22 applications for computers and mobile devices. The applications were voted on and the top three awarded prizes.
The winner was Meu Congress, a website that allows citizens to track the activities of their elected representatives, and monitor their expenses. Runner-ups included Monitora, Brasil!, an Android application that allows users to track proposed bills, attendance and the Twitter feeds of members; and Deliberatório, an online card game that simulates the deliberation of bills in the Chamber of Deputies.
The hackathon engaged the software developers directly with members and staff of the Chamber of Deputies, including the Chamber’s President, Henrique Eduardo Alves. Hackathon organizer Pedro Markun of Transparencia Hacker made a formal proposal to the President of the Chamber for a permanent outpost, where, as Markun said in an email, “we could hack from inside the leviathan’s belly.”
The Chamber’s Director-General has established nine staff positions for the Hacker Lab under the leadership of the Cristiano Ferri Faria, who spoke with me about the new project.
Faria explained that the hackathon event was a watershed moment for many public officials: “For 90-95% of parliamentarians and probably 80% of civil servants, they didn’t know how amazing a simple app, for instance, can make it much easier to analyze speeches.” Faria pointed to one of the hackathon contest entries, Retórica Parlamentar, which provides an interactive visualization of plenary remarks by members of the Chamber. “When members saw that, they got impressed and wondered, ‘There’s something new going on and we need to understand it and support it.’”

A World Of Wikipedia And Bitcoin: Is That The Promise Of Open Collaboration?

Science 2.0: “Open Collaboration, defined in a new paper as “any system of innovation or production that relies on goal-oriented yet loosely coordinated participants who interact to create a product (or service) of economic value, which they make available to contributors and non-contributors alike” brought the world Wikipedia, Bitcoin and, yes, even Science 2.0.
But what does that mean, really? That’s the first problem with vague terms in an open environment. It is anything people want it to be and sometimes what people want it to be is money, but hidden behind a guise of public weal.
TED’s lesser cousin TEDx is a result of open collaboration but there is no doubt it has successfully leveraged the marketing of TED to sell seats in auditoriums, just as it was designed to do. Generally, Open Collaboration now is less like its early days, where a group of like-minded people got together to create an Open Source tool, and more like corporations. Only they avoid the label, they are not quite non-profits and not quite corporations.
And because they are neither they can operate free of the cultural stigma. Despite efforts to claim that Wikipedia is a hotbed of misogyny and blocks out minorities, the online encyclopedia has endured just fine. Their defense is a simple one; they have no idea what gender or race or religion anyone is and anyone can contribute – it is a true open collaboration. Open Collaboration is goal-oriented, they lack the infrastructure to obey demands that they become about social justice, so the environments can be less touchy-feely than corporations and avoid the social authoritarianism of academia.
Many open collaborations perform well even in ‘harsh’ environments, where some minorities are underrepresented and diversity is lacking or when products by different groups rival one another. It’s a real puzzle for sociologists. The authors conclude that open collaboration is likely to expand into new domains, displacing traditional organizations, because it is so mission-oriented. Business executives and civic leaders should take heed – the future could look a lot more like the 1940s.”
See also: Sheen S. Levine, Michael J. Prietula, ‘Open Collaboration for Innovation: Principles and Performance’, Organization Science December 30, 2014 DOI:10.1287/orsc.2013.0872