Garbage In, Garbage Out… Or, How to Lie with Bad Data


Medium: For everyone who slept through Stats 101, Charles Wheelan’s Naked Statistics is a lifesaver. From batting averages and political polls to Schlitz ads and medical research, Wheelan “illustrates exactly why even the most reluctant mathophobe is well advised to achieve a personal understanding of the statistical underpinnings of life” (New York Times). What follows is adapted from the book, out now in paperback.
Behind every important study there are good data that made the analysis possible. And behind every bad study . . . well, read on. People often speak about “lying with statistics.” I would argue that some of the most egregious statistical mistakes involve lying with data; the statistical analysis is fine, but the data on which the calculations are performed are bogus or inappropriate. Here are some common examples of “garbage in, garbage out.”

Selection Bias

….Selection bias can be introduced in many other ways. A survey of consumers in an airport is going to be biased by the fact that people who fly are likely to be wealthier than the general public; a survey at a rest stop on Interstate 90 may have the opposite problem. Both surveys are likely to be biased by the fact that people who are willing to answer a survey in a public place are different from people who would prefer not to be bothered. If you ask 100 people in a public place to complete a short survey, and 60 are willing to answer your questions, those 60 are likely to be different in significant ways from the 40 who walked by without making eye contact.

Publication Bias

Positive findings are more likely to be published than negative findings, which can skew the results that we see. Suppose you have just conducted a rigorous, longitudinal study in which you find conclusively that playing video games does not prevent colon cancer. You’ve followed a representative sample of 100,000 Americans for twenty years; those participants who spend hours playing video games have roughly the same incidence of colon cancer as the participants who do not play video games at all. We’ll assume your methodology is impeccable. Which prestigious medical journal is going to publish your results?

Most things don’t prevent cancer.

None, for two reasons. First, there is no strong scientific reason to believe that playing video games has any impact on colon cancer, so it is not obvious why you were doing this study. Second, and more relevant here, the fact that something does not prevent cancer is not a particularly interesting finding. After all, most things don’t prevent cancer. Negative findings are not especially sexy, in medicine or elsewhere.
The net effect is to distort the research that we see, or do not see. Suppose that one of your graduate school classmates has conducted a different longitudinal study. She finds that people who spend a lot of time playing video games do have a lower incidence of colon cancer. Now that is interesting! That is exactly the kind of finding that would catch the attention of a medical journal, the popular press, bloggers, and video game makers (who would slap labels on their products extolling the health benefits of their products). It wouldn’t be long before Tiger Moms all over the country were “protecting” their children from cancer by snatching books out of their hands and forcing them to play video games instead.
Of course, one important recurring idea in statistics is that unusual things happen every once in a while, just as a matter of chance. If you conduct 100 studies, one of them is likely to turn up results that are pure nonsense—like a statistical association between playing video games and a lower incidence of colon cancer. Here is the problem: The 99 studies that find no link between video games and colon cancer will not get published, because they are not very interesting. The one study that does find a statistical link will make it into print and get loads of follow-on attention. The source of the bias stems not from the studies themselves but from the skewed information that actually reaches the public. Someone reading the scientific literature on video games and cancer would find only a single study, and that single study will suggest that playing video games can prevent cancer. In fact, 99 studies out of 100 would have found no such link.

Recall Bias

Memory is a fascinating thing—though not always a great source of good data. We have a natural human impulse to understand the present as a logical consequence of things that happened in the past—cause and effect. The problem is that our memories turn out to be “systematically fragile” when we are trying to explain some particularly good or bad outcome in the present. Consider a study looking at the relationship between diet and cancer. In 1993, a Harvard researcher compiled a data set comprising a group of women with breast cancer and an age-matched group of women who had not been diagnosed with cancer. Women in both groups were asked about their dietary habits earlier in life. The study produced clear results: The women with breast cancer were significantly more likely to have had diets that were high in fat when they were younger.
Ah, but this wasn’t actually a study of how diet affects the likelihood of getting cancer. This was a study of how getting cancer affects a woman’s memory of her diet earlier in life. All of the women in the study had completed a dietary survey years earlier, before any of them had been diagnosed with cancer. The striking finding was that women with breast cancer recalled a diet that was much higher in fat than what they actually consumed; the women with no cancer did not.

Women with breast cancer recalled a diet that was much higher in fat than what they actually consumed; the women with no cancer did not.

The New York Times Magazine described the insidious nature of this recall bias:

The diagnosis of breast cancer had not just changed a woman’s present and the future; it had altered her past. Women with breast cancer had (unconsciously) decided that a higher-fat diet was a likely predisposition for their disease and (unconsciously) recalled a high-fat diet. It was a pattern poignantly familiar to anyone who knows the history of this stigmatized illness: these women, like thousands of women before them, had searched their own memories for a cause and then summoned that cause into memory.

Recall bias is one reason that longitudinal studies are often preferred to cross-sectional studies. In a longitudinal study the data are collected contemporaneously. At age five, a participant can be asked about his attitudes toward school. Then, thirteen years later, we can revisit that same participant and determine whether he has dropped out of high school. In a cross-sectional study, in which all the data are collected at one point in time, we must ask an eighteen-year-old high school dropout how he or she felt about school at age five, which is inherently less reliable.

Survivorship Bias

Suppose a high school principal reports that test scores for a particular cohort of students has risen steadily for four years. The sophomore scores for this class were better than their freshman scores. The scores from junior year were better still, and the senior year scores were best of all. We’ll stipulate that there is no cheating going on, and not even any creative use of descriptive statistics. Every year this cohort of students has done better than it did the preceding year, by every possible measure: mean, median, percentage of students at grade level, and so on. Would you (a) nominate this school leader for “principal of the year” or (b) demand more data?

If you have a room of people with varying heights, forcing the short people to leave will raise the average height in the room, but it doesn’t make anyone taller.

I say “b.” I smell survivorship bias, which occurs when some or many of the observations are falling out of the sample, changing the composition of the observations that are left and therefore affecting the results of any analysis. Let’s suppose that our principal is truly awful. The students in his school are learning nothing; each year half of them drop out. Well, that could do very nice things for the school’s test scores—without any individual student testing better. If we make the reasonable assumption that the worst students (with the lowest test scores) are the most likely to drop out, then the average test scores of those students left behind will go up steadily as more and more students drop out. (If you have a room of people with varying heights, forcing the short people to leave will raise the average height in the room, but it doesn’t make anyone taller.)

Healthy User Bias

People who take vitamins regularly are likely to be healthy—because they are the kind of people who take vitamins regularly! Whether the vitamins have any impact is a separate issue. Consider the following thought experiment. Suppose public health officials promulgate a theory that all new parents should put their children to bed only in purple pajamas, because that helps stimulate brain development. Twenty years later, longitudinal research confirms that having worn purple pajamas as a child does have an overwhelmingly large positive association with success in life. We find, for example, that 98 percent of entering Harvard freshmen wore purple pajamas as children (and many still do) compared with only 3 percent of inmates in the Massachusetts state prison system.

The purple pajamas do not matter.

Of course, the purple pajamas do not matter; but having the kind of parents who put their children in purple pajamas does matter. Even when we try to control for factors like parental education, we are still going to be left with unobservable differences between those parents who obsess about putting their children in purple pajamas and those who don’t. As New York Times health writer Gary Taubes explains, “At its simplest, the problem is that people who faithfully engage in activities that are good for them—taking a drug as prescribed, for instance, or eating what they believe is a healthy diet—are fundamentally different from those who don’t.” This effect can potentially confound any study trying to evaluate the real effect of activities perceived to be healthful, such as exercising regularly or eating kale. We think we are comparing the health effects of two diets: kale versus no kale. In fact, if the treatment and control groups are not randomly assigned, we are comparing two diets that are being eaten by two different kinds of people. We have a treatment group that is different from the control group in two respects, rather than just one.

If statistics is detective work, then the data are the clues. My wife spent a year teaching high school students in rural New Hampshire. One of her students was arrested for breaking into a hardware store and stealing some tools. The police were able to crack the case because (1) it had just snowed and there were tracks in the snow leading from the hardware store to the student’s home; and (2) the stolen tools were found inside. Good clues help.
Like good data. But first you have to get good data, and that is a lot harder than it seems.

From funding agencies to scientific agency –


New paper on “Collective allocation of science funding as an alternative to peer review”: “Publicly funded research involves the distribution of a considerable amount of money. Funding agencies such as the US National Science Foundation (NSF), the US National Institutes of Health (NIH) and the European Research Council (ERC) give billions of dollars or euros of taxpayers’ money to individual researchers, research teams, universities, and research institutes each year. Taxpayers accordingly expect that governments and funding agencies will spend their money prudently and efficiently.

Investing money to the greatest effect is not a challenge unique to research funding agencies and there are many strategies and schemes to choose from. Nevertheless, most funders rely on a tried and tested method in line with the tradition of the scientific community: the peer review of individual proposals to identify the most promising projects for funding. This method has been considered the gold standard for assessing the scientific value of research projects essentially since the end of the Second World War.

However, there is mounting critique of the use of peer review to direct research funding. High on the list of complaints is the cost, both in terms of time and money. In 2012, for example, NSF convened more than 17,000 scientists to review 53,556 proposals [1]. Reviewers generally spend a considerable time and effort to assess and rate proposals of which only a minority can eventually get funded. Of course, such a high rejection rate is also frustrating for the applicants. Scientists spend an increasing amount of time writing and submitting grant proposals. Overall, the scientific community invests an extraordinary amount of time, energy, and effort into the writing and reviewing of research proposals, most of which end up not getting funded at all. This time would be better invested in conducting the research in the first place.

Peer review may also be subject to biases, inconsistencies, and oversights. The need for review panels to reach consensus may lead to sub‐optimal decisions owing to the inherently stochastic nature of the peer review process. Moreover, in a period where the money available to fund research is shrinking, reviewers may tend to “play it safe” and select proposals that have a high chance of producing results, rather than more challenging and ambitious projects. Additionally, the structuring of funding around calls‐for‐proposals to address specific topics might inhibit serendipitous discovery, as scientists work on problems for which funding happens to be available rather than trying to solve more challenging problems.

The scientific community holds peer review in high regard, but it may not actually be the best possible system for identifying and supporting promising science. Many proposals have been made to reform funding systems, ranging from incremental changes to peer review—including careful selection of reviewers [2] and post‐hoc normalization of reviews [3]—to more radical proposals such as opening up review to the entire online population [4] or removing human reviewers altogether by allocating funds through an objective performance measure [5].

We would like to add another alternative inspired by the mathematical models used to search the internet for relevant information: a highly decentralized funding model in which the wisdom of the entire scientific community is leveraged to determine a fair distribution of funding. It would still require human insight and decision‐making, but it would drastically reduce the overhead costs and may alleviate many of the issues and inefficiencies of the proposal submission and peer review system, such as bias, “playing it safe”, or reluctance to support curiosity‐driven research.

Our proposed system would require funding agencies to give all scientists within their remit an unconditional, equal amount of money each year. However, each scientist would then be required to pass on a fixed percentage of their previous year’s funding to other scientists whom they think would make best use of the money (Fig 1). Every year, then, scientists would receive a fixed basic grant from their funding agency combined with an elective amount of funding donated by their peers. As a result of each scientist having to distribute a given percentage of their previous year’s budget to other scientists, money would flow through the scientific community. Scientists who are generally anticipated to make the best use of funding will accumulate more.”

The Failure and the Promise of Public Participation


Dr. Mark Funkhouser in Governing: “In a recent study entitled Making Public Participation Legal, Matt Leighninger cites a Knight Foundation report that found that attending a public meeting was more likely to reduce a person’s sense of efficacy and attachment to the community than to increase it. That sad fact is no surprise to the government officials who have to run — and endure — public meetings.
Every public official who has served for any length of time has horror stories about these forums. The usual suspects show up — the self-appointed activists (who sometimes seem to be just a little nuts) and the lobbyists. Regular folks have made the calculation that only in extreme circumstance, when they are really scared or angry, is attending a public hearing worth their time. And who can blame them when it seems clear that the game is rigged, the decisions already have been made, and they’ll probably have to sit through hours of blather before they get their three minutes at the microphone?
So much transparency and yet so little trust. Despite the fact that governments are pumping out more and more information to citizens, trust in government has edged lower and lower, pushed in part no doubt by the lingering economic hardships and government cutbacks resulting from the recession. Most public officials I talk to now take it as an article of faith that the public generally disrespects them and the governments they work for.
Clearly the relationship between citizens and their governments needs to be reframed. Fortunately, over the last couple of decades lots of techniques have been developed by advocates of deliberative democracy and citizen participation that provide both more meaningful engagement and better community outcomes. There are decision-making forums, “visioning” forums and facilitated group meetings, most of which feature some combination of large-group, small-group and online interactions.
But here’s the rub: Our legal framework doesn’t support these new methods of public participation. This fact is made clear in Making Public Participation Legal, which was compiled by a working group that included people from the National Civic League, the American Bar Association, the International City/County Management Association and a number of leading practitioners of public participation.
The requirements for public meetings in local governments are generally built into state statutes such as sunshine or open-meetings laws or other laws governing administrative procedures. These laws may require public hearings in certain circumstances and mandate that advance notice, along with an agenda, be posted for any meeting of an “official body” — from the state legislature to a subcommittee of the city council or an advisory board of some kind. And a “meeting” is one in which a quorum attends. So if three of a city council’s nine members sit on the finance committee and two of the committee members happen to show up at a public meeting, they may risk having violated the open-meetings law…”

Why the Nate Silvers of the World Don’t Know Everything


Felix Salmon in Wired: “This shift in US intelligence mirrors a definite pattern of the past 30 years, one that we can see across fields and institutions. It’s the rise of the quants—that is, the ascent to power of people whose native tongue is numbers and algorithms and systems rather than personal relationships or human intuition. Michael Lewis’ Moneyball vividly recounts how the quants took over baseball, as statistical analy­sis trumped traditional scouting and propelled the underfunded Oakland A’s to a division-winning 2002 season. More recently we’ve seen the rise of the quants in politics. Commentators who “trusted their gut” about Mitt Romney’s chances had their gut kicked by Nate Silver, the stats whiz who called the election days before­hand as a lock for Obama, down to the very last electoral vote in the very last state.
The reason the quants win is that they’re almost always right—at least at first. They find numerical patterns or invent ingenious algorithms that increase profits or solve problems in ways that no amount of subjective experience can match. But what happens after the quants win is not always the data-driven paradise that they and their boosters expected. The more a field is run by a system, the more that system creates incentives for everyone (employees, customers, competitors) to change their behavior in perverse ways—providing more of whatever the system is designed to measure and produce, whether that actually creates any value or not. It’s a problem that can’t be solved until the quants learn a little bit from the old-fashioned ways of thinking they’ve displaced.
No matter the discipline or industry, the rise of the quants tends to happen in four stages. Stage one is what you might call pre-disruption, and it’s generally best visible in hindsight. Think about quaint dating agencies in the days before the arrival of Match .com and all the other algorithm-powered online replacements. Or think about retail in the era before floor-space management analytics helped quantify exactly which goods ought to go where. For a live example, consider Hollywood, which, for all the money it spends on market research, is still run by a small group of lavishly compensated studio executives, all of whom are well aware that the first rule of Hollywood, as memorably summed up by screenwriter William Goldman, is “Nobody knows anything.” On its face, Hollywood is ripe for quantifi­cation—there’s a huge amount of data to be mined, considering that every movie and TV show can be classified along hundreds of different axes, from stars to genre to running time, and they can all be correlated to box office receipts and other measures of profitability.
Next comes stage two, disruption. In most industries, the rise of the quants is a recent phenomenon, but in the world of finance it began back in the 1980s. The unmistakable sign of this change was hard to miss: the point at which you started getting targeted and personalized offers for credit cards and other financial services based not on the relationship you had with your local bank manager but on what the bank’s algorithms deduced about your finances and creditworthiness. Pretty soon, when you went into a branch to inquire about a loan, all they could do was punch numbers into a computer and then give you the computer’s answer.
For a present-day example of disruption, think about politics. In the 2012 election, Obama’s old-fashioned campaign operatives didn’t disappear. But they gave money and freedom to a core group of technologists in Chicago—including Harper Reed, former CTO of the Chicago-based online retailer Threadless—and allowed them to make huge decisions about fund-raising and voter targeting. Whereas earlier campaigns had tried to target segments of the population defined by geography or demographic profile, Obama’s team made the campaign granular right down to the individual level. So if a mom in Cedar Rapids was on the fence about who to vote for, or whether to vote at all, then instead of buying yet another TV ad, the Obama campaign would message one of her Facebook friends and try the much more effective personal approach…
After disruption, though, there comes at least some version of stage three: over­shoot. The most common problem is that all these new systems—metrics, algo­rithms, automated decisionmaking processes—result in humans gaming the system in rational but often unpredictable ways. Sociologist Donald T. Campbell noted this dynamic back in the ’70s, when he articulated what’s come to be known as Campbell’s law: “The more any quantitative social indicator is used for social decision-making,” he wrote, “the more subject it will be to corruption pressures and the more apt it will be to distort and corrupt the social processes it is intended to monitor.”…
Policing is a good example, as explained by Harvard sociologist Peter Moskos in his book Cop in the Hood: My Year Policing Baltimore’s Eastern District. Most cops have a pretty good idea of what they should be doing, if their goal is public safety: reducing crime, locking up kingpins, confiscating drugs. It involves foot patrols, deep investigations, and building good relations with the community. But under statistically driven regimes, individual officers have almost no incentive to actually do that stuff. Instead, they’re all too often judged on results—specifically, arrests. (Not even convictions, just arrests: If a suspect throws away his drugs while fleeing police, the police will chase and arrest him just to get the arrest, even when they know there’s no chance of a conviction.)…
It’s increasingly clear that for smart organizations, living by numbers alone simply won’t work. That’s why they arrive at stage four: synthesis—the practice of marrying quantitative insights with old-fashioned subjective experience. Nate Silver himself has written thoughtfully about examples of this in his book, The Signal and the Noise. He cites baseball, which in the post-Moneyball era adopted a “fusion approach” that leans on both statistics and scouting. Silver credits it with delivering the Boston Red Sox’s first World Series title in 86 years. Or consider weather forecasting: The National Weather Service employs meteorologists who, understanding the dynamics of weather systems, can improve forecasts by as much as 25 percent compared with computers alone. A similar synthesis holds in eco­nomic forecasting: Adding human judgment to statistical methods makes results roughly 15 percent more accurate. And it’s even true in chess: While the best computers can now easily beat the best humans, they can in turn be beaten by humans aided by computers….
That’s what a good synthesis of big data and human intuition tends to look like. As long as the humans are in control, and understand what it is they’re controlling, we’re fine. It’s when they become slaves to the numbers that trouble breaks out. So let’s celebrate the value of disruption by data—but let’s not forget that data isn’t everything.

Entrepreneurs Shape Free Data Into Money


Angus Loten in the Wall Street Journal: “More cities are putting information on everything from street-cleaning schedules to police-response times and restaurant inspection reports in the public domain, in the hope that people will find a way to make money off the data.
Supporters of such programs often see them as a local economic stimulus plan, allowing software developers and entrepreneurs in cities ranging from San Francisco to South Bend, Ind., to New York, to build new businesses based on the information they get from government websites.
When Los Angeles Mayor Eric Garcetti issued an executive directive last month to launch the city’s open-data program, he cited entrepreneurs and businesses as important beneficiaries. Open-data promotes innovation and “gives companies, individuals, and nonprofit organizations the opportunity to leverage one of government’s greatest assets: public information,” according to the Dec. 18 directive.
A poster child for the movement might be 34-year-old Matt Ehrlichman of Seattle, who last year built an online business in part using Seattle work permits, professional licenses and other home-construction information gathered up by the city’s Department of Planning and Development.
While his website is free, his business, called Porch.com, has more than 80 employees and charges a $35 monthly fee to industry professionals who want to boost the visibility of their projects on the site.
The site gathers raw public data—such as addresses for homes under renovation, what they are doing, who is doing the work and how much they are charging—and combines it with photos and other information from industry professionals and homeowners. It then creates a searchable database for users to compare ideas and costs for projects near their own neighborhood.
…Ian Kalin, director of open-data services at Socrata, a Seattle-based software firm that makes the back-end applications for many of these government open-data sites, says he’s worked with hundreds of companies that were formed around open data.
Among them is Climate Corp., a San Francisco-based firm that collects weather and yield-forecasting data to help farmers decide when and where to plant crops. Launched in 2006, the firm was acquired in October by Monsanto Co. MON -2.90% , the seed-company giant, for $930 million.
Overall, the rate of new business formation declined nationally between 2006 and 2010. But according to the latest data from the Ewing Marion Kauffman Foundation, an entrepreneurship advocacy group in Kansas City, Mo., the rate of new business formation in Seattle in 2011 rose 9.41% in 2011, compared with the national average of 3.9%.
Other cities where new business formation was ahead of the national average include Chicago, Austin, Texas, Baltimore, and South Bend, Ind.—all cities that also have open-data programs. Still, how effective the ventures are in creating jobs is difficult to gauge.
One wrinkle: privacy concerns about the potential for information—such as property tax and foreclosure data—to be misused.
Some privacy advocates fear that government data that include names, addresses and other sensitive information could be used by fraudsters to target victims.”

The Emergence Of The Connected City


Glen Martin at Forbes: “If the modern city is a symbol for randomness — even chaos — the city of the near future is shaping up along opposite metaphorical lines. The urban environment is evolving rapidly, and a model is emerging that is more efficient, more functional, more — connected, in a word.
This will affect how we work, commute, and spend our leisure time. It may well influence how we relate to one another, and how we think about the world. Certainly, our lives will be augmented: better public transportation systems, quicker responses from police and fire services, more efficient energy consumption. But there could also be dystopian impacts: dwindling privacy and imperiled personal data. We could even lose some of the ferment that makes large cities such compelling places to live; chaos is stressful, but it can also be stimulating.
It will come as no surprise that converging digital technologies are driving cities toward connectedness. When conjoined, ISM band transmitters, sensors, and smart phone apps form networks that can make cities pretty darn smart — and maybe more hygienic. This latter possibility, at least, is proposed by Samrat Saha of the DCI Marketing Group in Milwaukee. Saha suggests “crowdsourcing” municipal trash pick-up via BLE modules, proximity sensors and custom mobile device apps.
“My idea is a bit tongue in cheek, but I think it shows how we can gain real efficiencies in urban settings by gathering information and relaying it via the Cloud,” Saha says. “First, you deploy sensors in garbage cans. Each can provides a rough estimate of its fill level and communicates that to a BLE 112 Module.”
As pedestrians who have downloaded custom “garbage can” apps on their BLE-capable iPhone or Android devices pass by, continues Saha, the information is collected from the module and relayed to a Cloud-hosted service for action — garbage pick-up for brimming cans, in other words. The process will also allow planners to optimize trash can placement, redeploying receptacles from areas where need is minimal to more garbage-rich environs….
Garbage can connectivity has larger implications than just, well, garbage. Brett Goldstein, the former Chief Data and Information Officer for the City of Chicago and a current lecturer at the University of Chicago, says city officials found clear patterns between damaged or missing garbage cans and rat problems.
“We found areas that showed an abnormal increase in missing or broken receptacles started getting rat outbreaks around seven days later,” Goldstein said. “That’s very valuable information. If you have sensors on enough garbage cans, you could get a temporal leading edge, allowing a response before there’s a problem. In urban planning, you want to emphasize prevention, not reaction.”
Such Cloud-based app-centric systems aren’t suited only for trash receptacles, of course. Companies such as Johnson Controls are now marketing apps for smart buildings — the base component for smart cities. (Johnson’s Metasys management system, for example, feeds data to its app-based Paoptix Platform to maximize energy efficiency in buildings.) In short, instrumented cities already are emerging. Smart nodes — including augmented buildings, utilities and public service systems — are establishing connections with one another, like axon-linked neurons.
But Goldstein, who was best known in Chicago for putting tremendous quantities of the city’s data online for public access, emphasizes instrumented cities are still in their infancy, and that their successful development will depend on how well we “parent” them.
“I hesitate to refer to ‘Big Data,’ because I think it’s a terribly overused term,” Goldstein said. “But the fact remains that we can now capture huge amounts of urban data. So, to me, the biggest challenge is transitioning the fields — merging public policy with computer science into functional networks.”…”

Crowdsourcing Social Problems


Article by   in Reason: “reCAPTCHA and Duolingo both represent a distinctly 21st-century form of distributed problem solving. These Internet-enabled approaches tend to be faster, far less expensive, and far more resilient than the heavyweight industrial-age methods of solving big social problems that we’ve grown accustomed to over the past century. They typically involve highly diverse resources-volunteer time, crowdfunding, the capabilities of multinational corporations, entrepreneurial capital, philanthropic funding-aligned around common objectives such as reducing congestion, providing safe drinking water, or promoting healthy living. Crowdsourcing offers not just a better way of doing things, but a radical challenge to the bureaucratic status quo.
Here are several ways public, private, and nonprofit organizations can use lightweight, distributed approaches to solve societal problems faster and cheaper than the existing sclerotic models.
Chunk the Problem
The genius of reCAPTCHA and Duolingo is that they divide labor into small increments, performed for free, often by people who are unaware of the project they’re helping to complete. This strategy has wide public-policy applications, even in dealing with potholes….
Meanwhile, Finland’s DigitalKoot project enlisted volunteers to digitize their own libraries by playing a computer game that challenged them to transcribe scans of antique manuscripts.
Governments can set up a microtasking platform, not just for citizen engagement but as a way to harness the knowledge and skills of public employees across multiple departments and agencies. If microtasking can work to connect people outside the “four walls” of an organization, think of its potential as a platform to connect people and conduct work inside an organization-even an organization as bureaucratic as government.

Decentralize Service to the Self
A young woman slices her finger on a knife. As she compresses the bleeding with gauze, she needs to know if her wound warrants stitches. So she calls up Blue Cross’ 24-hour nurse hotline, where patients call to learn if they should see a doctor. The nurse asks her to describe the depth of the cut. He explains she should compress it with gauze and skip the ER. In aggregate, savings like this amount to millions of dollars of avoided emergency room visits.
Since 2003, Blue Cross has been shifting the work of basic triage and risk mitigation to customers. Britain’s National Health Service (NHS) implemented a similar program, NHS Direct, in 1998. NHS estimates that the innovation has saved it £44 million a year….
Gamify Drudgery
Finland’s national library houses an enormous archive of antique texts, which officials hoped to scan and digitize into ordinary, searchable text documents. Rather than simply hire people for the tedium of correcting garbled OCR scans, the library invited the public to play a game. An online program called DigitalKoot lets people transcribe scanned words, and by typing accurately, usher a series of cartoon moles safely across a bridge….
Build a Two-Sided Market
Road infrastructure costs government five cents per driver per mile, according to the Victoria Transport Policy Institute. “That’s a dollar the government paid for the paving of that road and the maintaining of that infrastructure…just for you, not the other 3,000 people that travelled that same segment of highway in that same hour that you did,” says Sean O’Sullivan, founder of Carma, a ridesharing application.
Ridesharing companies such as Carma, Lyft, and Zimride are attempting to recruit private cars for the public transit network, by letting riders pay a small fee to carpool. A passenger waits at a designated stop, and the app alerts drivers, who can scan a profile of their potential rider. It’s a prime example of a potent new business model…
Remove the Middleman
John McNair dropped out of high school at age 16. By his thirties, he became an entrepreneur, producing and selling handmade guitars, but carpentry alone wouldn’t grow his business. So the founder of Red Dog Guitars enrolled in a $20 class on Skillshare.com, taught by the illustrator John Contino, to learn to brand his work with hand lettered product labels. Soon, a fellow businessman was asking McNair for labels to market guitar pickups.
Traditionally, the U.S. government might invest in retraining someone like John. Instead, peer-to-peer technology has allowed a community of designers to help John develop his skills. Peer-to-peer strategies enable citizens to meet each other’s needs, cheaply. Peer-to-peer solutions can help fix problems, deliver services, and supplement traditional approaches.
Peer-to-peer can lessen our dependence on big finance. Kickstarter lets companies skip the energy of convincing a banker that their product is viable. They just need to convince customers…”

A permanent hacker space in the Brazilian Congress


Blog entry by Dan Swislow at OpeningParliament: “On December 17, the presidency of the Brazilian Chamber of Deputies passed a resolution that creates a permanent Laboratório Ráquer or “Hacker Lab” inside the Chamber—a global first.
Read the full text of the resolution in Portuguese.
The resolution mandates the creation of a physical space at the Chamber that is “open for access and use by any citizen, especially programmers and software developers, members of parliament and other public workers, where they can utilize public data in a collaborative fashion for actions that enhance citizenship.”
The idea was born out of a week-long, hackathon (or “hacker marathon”) event hosted by the Chamber of Deputies in November, with the goal of using technology to enhance the transparency of legislative work and increase citizen understanding of the legislative process. More than 40 software developers and designers worked to create 22 applications for computers and mobile devices. The applications were voted on and the top three awarded prizes.
The winner was Meu Congress, a website that allows citizens to track the activities of their elected representatives, and monitor their expenses. Runner-ups included Monitora, Brasil!, an Android application that allows users to track proposed bills, attendance and the Twitter feeds of members; and Deliberatório, an online card game that simulates the deliberation of bills in the Chamber of Deputies.
The hackathon engaged the software developers directly with members and staff of the Chamber of Deputies, including the Chamber’s President, Henrique Eduardo Alves. Hackathon organizer Pedro Markun of Transparencia Hacker made a formal proposal to the President of the Chamber for a permanent outpost, where, as Markun said in an email, “we could hack from inside the leviathan’s belly.”
The Chamber’s Director-General has established nine staff positions for the Hacker Lab under the leadership of the Cristiano Ferri Faria, who spoke with me about the new project.
Faria explained that the hackathon event was a watershed moment for many public officials: “For 90-95% of parliamentarians and probably 80% of civil servants, they didn’t know how amazing a simple app, for instance, can make it much easier to analyze speeches.” Faria pointed to one of the hackathon contest entries, Retórica Parlamentar, which provides an interactive visualization of plenary remarks by members of the Chamber. “When members saw that, they got impressed and wondered, ‘There’s something new going on and we need to understand it and support it.’”

When Tech Culture And Urbanism Collide


John Tolva: “…We can build upon the success of the work being done at the intersection of technology and urban design, right now.

For one, the whole realm of social enterprise — for-profit startups that seek to solve real social problems — has a huge overlap with urban issues. Impact Engine in Chicago, for instance, is an accelerator squarely focused on meaningful change and profitable businesses. One of their companies, Civic Artworks, has set as its goal rebalancing the community planning process.

The Code for America Accelerator and Tumml, both located in San Francisco, morph the concept of social innovation into civic/urban innovation. The companies nurtured by CfA and Tumml are filled with technologists and urbanists working together to create profitable businesses. Like WorkHands, a kind of LinkedIn for blue collar trades. Would something like this work outside a city? Maybe. Are its effects outsized and scale-ready in a city? Absolutely. That’s the opportunity in urban innovation.

Scale is what powers the sharing economy and it thrives because of the density and proximity of cities. In fact, shared resources at critical density is one of the only good definitions for what a city is. It’s natural that entrepreneurs have overlaid technology on this basic fact of urban life to amplify its effects. Would TaskRabbit, Hailo or LiquidSpace exist in suburbia? Probably, but their effects would be minuscule and investors would get restless. The city in this regard is the platform upon which sharing economy companies prosper. More importantly, companies like this change the way the city is used. It’s not urban planning, but it is urban (re)design and it makes a difference.

A twist that many in the tech sector who complain about cities often miss is that change in a city is not the same thing as change in city government. Obviously they are deeply intertwined; change is mighty hard when it is done at cross-purposes with government leadership. But it happens all the time. Non-government actors — foundations, non-profits, architecture and urban planning firms, real estate developers, construction companies — contribute massively to the shape and health of our cities.

Often this contribution is powered through policies of open data publication by municipal governments. Open data is the raw material of a city, the vital signs of what has happened there, what is happening right now, and the deep pool of patterns for what might happen next.

Tech entrepreneurs would do well to look at the organizations and companies capitalizing on this data as the real change agents, not government itself. Even the data in many cases is generated outside government. Citizens often do the most interesting data-gathering, with tools like LocalData. The most exciting thing happening at the intersection of technology and cities today — what really makes them “smart” — is what is happening at the periphery of city government. It’s easy to belly-ache about government and certainly there are administrations that to do not make data public (or shut it down), but tech companies who are truly interested in city change should know that there are plenty of examples of how to start up and do it.

And yet, the somewhat staid world of architecture and urban-scale design presents the most opportunity to a tech community interested in real urban change. While technology obviously plays a role in urban planning — 3D visual design tools like Revit and mapping services like ArcGIS are foundational for all modern firms — data analytics as a serious input to design matters has only been used in specialized (mostly energy efficiency) scenarios. Where are the predictive analytics, the holistic models, the software-as-a-service providers for the brave new world of urban informatics and The Internet of Things? Technologists, it’s our move.

Something’s amiss when some city governments — rarely the vanguard in technological innovation — have more sophisticated tools for data-driven decision-making than the private sector firms who design the city. But some understand the opportunity. Vannevar Technology is working on it, as is Synthicity. There’s plenty of room for the most positive aspects of tech culture to remake the profession of urban planning itself. (Look to NYU’s Center for Urban Science and Progress and the University of Chicago’s Urban Center for Computation and Data for leadership in this space.)…”

Rethinking Why People Participate


Tiago Peixoto: “Having a refined understanding of what leads people to participate is one of the main concerns of those working with citizen engagement. But particularly when it comes to participatory democracy, that understanding is only partial and, most often, the cliché “more research is needed” is definitely applicable. This is so for a number of reasons, four of which are worth noting here.

  1. The “participatory” label is applied to greatly varied initiatives, raising obvious methodological challenges for comparative research and cumulative learning. For instance, while both participatory budgeting and online petitions can be roughly categorized as “participatory” processes, they are entirely different in terms of fundamental aspects such as their goals, institutional design and expected impact on decision-making.
  2. The fact that many participatory initiatives are conceived as “pilots” or one-off events gives researchers little time to understand the phenomenon, come up with sound research questions, and test different hypotheses over time.  The “pilotitis” syndrome in the tech4accountability space is a good example of this.
  3. When designing and implementing participatory processes, in the face of budget constraints the first victims are documentation, evaluation and research. Apart from a few exceptions, this leads to a scarcity of data and basic information that undermines even the most heroic “archaeological” efforts of retrospective research and evaluation (a far from ideal approach).
  4. The semantic extravaganza that currently plagues the field of citizen engagement, technology and open government makes cumulative learning all the more difficult.

Precisely for the opposite reasons, our knowledge of electoral participation is in better shape. First, despite the differences between elections, comparative work is relatively easy, which is attested by the high number of cross-country studies in the field. Second, the fact that elections (for the most part) are repeated regularly and following a similar design enables the refinement of hypotheses and research questions over time, and specific time-related analysis (see an example here [PDF]). Third, when compared to the funds allocated to research in participatory initiatives, the relative amount of resources channeled into electoral studies and voting behavior is significantly higher. Here I am not referring to academic work only but also to the substantial resources invested by the private sector and parties towards a better understanding of elections and voting behavior. This includes a growing body of knowledge generated by get-out-the-vote (GOTV) research, with fascinating experimental evidence from interventions that seek to increase participation in elections (e.g. door-to-door campaigns, telemarketing, e-mail). Add to that the wealth of electoral data that is available worldwide (in machine-readable formats) and you have some pretty good knowledge to tap into. Finally, both conceptually and terminologically, the field of electoral studies is much more consistent than the field of citizen engagement which, in the long run, tends to drastically impact how knowledge of a subject evolves.
These reasons should be sufficient to capture the interest of those who work with citizen engagement. While the extent to which the knowledge from the field of electoral participation can be transferred to non-electoral participation remains an open question, it should at least provide citizen engagement researchers with cues and insights that are very much worth considering…”