How to stop being so easily manipulated by misleading statistics


Q&A by Akshat Rathi in Quartz: “There are three kinds of lies: Lies, damned lies, and statistics.” Few people know the struggle of correcting such lies better than David Spiegelhalter. Since 2007, he has been the Winton professor for the public understanding of risk (though he prefers “statistics” to “risk”) at the University of Cambridge.In a sunlit hotel room in Washington DC, Quartz caught up with Spiegelhalter recently to talk about his unique job. The conversation sprawled from the wisdom of eating bacon (would you swallow any other known carcinogen?), to the serious crime of manipulating charts, to the right way to talk about rare but scary diseases.

In a sunlit hotel room in Washington DC, Quartz caught up with Spiegelhalter recently to talk about his unique job. The conversation sprawled from the wisdom of eating bacon (would you swallow any other known carcinogen?), to the serious crime of manipulating charts, to the right way to talk about rare but scary diseases.

 When he isn’t fixing people’s misunderstandings of numbers, he works to communicate numbers better so that misunderstandings can be avoided from the beginning. The interview is edited and condensed for clarity….
What’s a recent example of misrepresentation of statistics that drove you bonkers?
I got very grumpy at an official graph of British teenage pregnancy rates that apparently showed they had declined to nearly zero. Until I realized that the bottom part of the axis had been cut off, which made it impossible to visualize the (very impressive) 50% reduction since 2000.You once said graphical representation of data does not always communicate what we think it communicates. What do you mean by that?
Graphs can be as manipulative as words. Using tricks such as cutting axes, rescaling things, changing data from positive to negative, etc. Sometimes putting zero on the y-axis is wrong. So to be sure that you are communicating the right things, you need to evaluate the message that people are taking away. There are no absolute rules. It all depends on what you want to communicate….

Poorly communicated risk can have a severe effect. For instance, the news story about the risk that pregnant women are exposing their unborn child to when they drink alcohol caused stress to one of our news editors who had consumed wine moderately through her pregnancy.

 I think it’s irresponsible to say there is a risk when they actually don’t know if there is one. There is scientific uncertainty about that.
  “‘Absence of evidence is not evidence of absence.’ I hate that phrase…It’s always used in a manipulative way.” In such situations of unknown risk, there is a phrase that is often used: “Absence of evidence is not evidence of absence.” I hate that phrase. I get so angry when people use that phrase. It’s always used in a manipulative way. I say to them that it’s not evidence of absence, but if you’ve looked hard enough you’ll see that most of the time the evidence shows a very small effect, if at all.

So on the risks of drinking alcohol while being pregnant, the UK’s health authority said that as a precautionary step it’s better not to drink. That’s fair enough. This honesty is important. To say that we don’t definitely know if drinking is harmful, but to be safe we say you shouldn’t. That’s treating people as adults and allowing them to use their own judgement.

Science is a bigger and bigger part of our lives. What is the limitation in science journalism right now and how can we improve it?...(More)

Your Data Footprint Is Affecting Your Life In Ways You Can’t Even Imagine


Jessica Leber at Fast Co-Exist: “Cities have long seen the potential in big data to improve the government and the lives of citizens, and this is now being put into action in ways where governments touch citizens’ lives in very sensitive areas. New York City’s Department of Homelessness Services is mining apartment eviction filings, to see if they can understand who is at risk of becoming homeless and intervene early. And police departments all over the country have adopted predictive policing software that guides where officers should deploy, and at what time, leading to reduced crime in some cities.

In one study in Los Angeles, police officers deployed to certain neighborhoods by predictive policing software prevented 4.3 crimes per week, compared to 2 crimes per week when assigned to patrol a specific area by human crime analysts. Surely, a reduction in crime is a good thing. But community activists in places such as Bellingham, Washington, have grave doubts. They worry that outsiders can’t examine how the algorithms work, since the software is usually proprietary, and so citizens have no way of knowing what data the government is using to target them. They also worry that predictive policing is just exacerbating existing patterns of racial profiling. If the underlying crime data being used is the result of years of over-policing minority communities for minor offenses, then the predictions based on this biased data could create a feedback loop and lead to yet more over-policing.

At a smaller and more limited scale is the even more sensitive area of child protection services. Though the data isn’t really as “big” as in other examples, a few agencies are carefully exploring using statistical models to make decisions in several areas, such as which children in the system are most in danger of violence, which children are most in need of a trauma screening, and which are at risk of entering the criminal justice system. 

In Hillsborough County, Florida, where a series of child homicides occurred, a private provider selected to manage the county’s child welfare system in 2012 came in and analyzed the data. Cases with the highest probability of serious injury or death had a few factors in common, they found: a child under the age of three, a “paramour” in the home, a substance abuse or domestic violence history, and a parent previously in the foster care system. They identified nine practices to use in these cases and hired a software provider to create a dashboard that allowed real-time feedback and dashboards. Their success has led to the program being implemented statewide….

“I think the opportunity is a rich one. At the same time, the ethical considerations need to be guiding us,” says Jesse Russell, chief program officer at the National Council on Crime and Delinquency, who has followed the use of predictive analytics in child protective services. Officials, he says, are treading carefully before using data to make decisions about individuals, especially when the consequences of being wrong—such as taking a child out of his or her home unnecessarily—are huge. And while caseworker decision-making can be flawed or biased, so can the programs that humans design. When you rely too much on data—if the data is flawed or incomplete, as could be the case in predictive policing—you risk further validating bad decisions or existing biases….

On the other hand, big data does have the potential to vastly expand our understanding of who we are and why we do what we do. A decade ago, serious scientists would have laughed someone out of the room who proposed a study of “the human condition.” It is a topic so broad and lacking in measurability. But perhaps the most important manifestation of big data in people’s lives could come from the ability for scientists to study huge, unwieldy questions they couldn’t before.

A massive scientific undertaking to study the human condition is set to launch in January of 2017. The Kavli Human Project, funded by the Kavli Foundation, plans to recruit 10,000 New Yorkers from all walks of life to be measured for 10 years. And by measured, they mean everything: all financial transactions, tax returns, GPS coordinates, genomes, chemical exposure, IQ, bluetooth sensors around the house, who subjects text and call—and that’s just the beginning. In all, the large team of academics expect to collect about a billion data points per person per year at an unprecedented low cost for each data point compared to other large research surveys.

The hope is with so much continuous data, researchers can for the first time start to disentangle the complex, seemingly unanswerable questions that have plagued our society, from what is causing the obesity epidemic to how to disrupt the poverty to prison cycle….(More)

How to Crowdsource the Syrian Cease-Fire


Colum Lynch at Foreign Policy: “Can the wizards of Silicon Valley develop a set of killer apps to monitor the fragile Syria cease-fire without putting foreign boots on the ground in one of the world’s most dangerous countries?

They’re certainly going to try. The “cessation of hostilities” in Syria brokered by the United States and Russia last month has sharply reduced the levels of violence in the war-torn country and sparked a rare burst of optimism that it could lead to a broader cease-fire. But if the two sides lay down their weapons, the international community will face the challenge of monitoring the battlefield to ensure compliance without deploying peacekeepers or foreign troops. The emerging solution: using crowdsourcing, drones, satellite imaging, and other high-tech tools.

The high-level interest in finding a technological solution to the monitoring challenge was on full display last month at a closed-door meeting convened by the White House that brought together U.N. officials, diplomats, digital cartographers, and representatives of Google, DigitalGlobe, and other technology companies. Their assignment was to brainstorm ways of using high-tech tools to keep track of any future cease-fires from Syria to Libya and Yemen.

The off-the-record event came as the United States, the U.N., and other key powers struggle to find ways of enforcing cease-fires from Syria at a time when there is little political will to run the risk of sending foreign forces or monitors to such dangerous places. The United States has turned to high-tech weapons like armed drones as weapons of war; it now wants to use similar systems to help enforce peace.

Take the Syria Conflict Mapping Project, a geomapping program developed by the Atlanta-based Carter Center, a nonprofit founded by former U.S. President Jimmy Carter and his wife, Rosalynn, to resolve conflict and promote human rights. The project has developed an interactive digital map that tracks military formations by government forces, Islamist extremists, and more moderate armed rebels in virtually every disputed Syrian town. It is now updating its technology to monitor cease-fires.

The project began in January 2012 because of a single 25-year-old intern, Christopher McNaboe. McNaboe realized it was possible to track the state of the conflict by compiling disparate strands of publicly available information — including the shelling and aerial bombardment of towns and rebel positions — from YouTube, Twitter, and other social media sites. It has since developed a mapping program using software provided by Palantir Technologies, a Palo Alto-based big data company that does contract work for U.S. intelligence and defense agencies, from the CIA to the FBI….

Walter Dorn, an expert on technology in U.N. peace operations who attended the White House event, said he had promoted what he calls a “coalition of the connected.”

The U.N. or other outside powers could start by tracking social media sites, including Twitter and YouTube, for reports of possible cease-fire violations. That information could then be verified by “seeded crowdsourcing” — that is, reaching out to networks of known advocates on the ground — and technological monitoring through satellite imagery or drones.

Matthew McNabb, the founder of First Mile Geo, a start-up which develops geolocation technology that can be used to gather data in conflict zones, has another idea. McNabb, who also attended the White House event, believes “on-demand” technologies like SurveyMonkey, which provides users a form to create their own surveys, can be applied in conflict zones to collect data on cease-fire violations….(More)

The Social Intranet: Insights on Managing and Sharing Knowledge Internally


Paper by Ines Mergel for IBM Center for the Business of Government: “While much of the federal government lags behind, some agencies are pioneers in the internal use of social media tools.  What lessons and effective practices do they have to offer other agencies?

Social intranets,” Dr. Mergel writes, “are in-house social networks that use technologies – such as automated newsfeeds, wikis, chats, or blogs – to create engagement opportunities among employees.”  They also include the use of internal profile pages that help people identify expertise and interest (similar to Facebook or LinkedIn profiles), and that are used in combination with other social Intranet tools such as on-line communities or newsfeeds.

The report documents four case studies of government use of social intranets – two federal government agencies (the Department of State and the National Aeronautics and Space Administration) and two cross-agency networks (the U.S. Intelligence Community and the Government of Canada).

The author observes: “Most enterprise social networking platforms fail,” but identifies what causes these failures and how successful social intranet initiatives can avoid that fate and thrive.  She offers a series of insights for successfully implementing social intranets in the public sector, based on her observations and case studies. …(More)”

“Streetfight” by Janette Sadik-Khan


Review by Amrita Gupta in Policy Innovations of the book Streetfight: “Janette Sadik-Khan was New York City’s transportation commissioner from 2007-2013. Under her watch, the city’s streets were reimagined and redesigned to include more than 60 plazas and 400 miles of bike lanes—radical changes designed to improve traffic and make spaces safer for everybody. Over seven years, New York City underwent a transportation transformation, played out avenue by avenue, block by block. Times Square went from being the city’s worst traffic nightmare to a two-and-a-half acre outdoor sitting area in 2009. Citi Bike, arguably her biggest success, is the nation’s largest bike share system, and the city’s first new transit system in over half a century. In Streetfight, Sadik-Khan breaks down her achievements into replicable ideas for urban planners and traffic engineers everywhere, and she also reminds us that the fight isn’t over. As part of Bloomberg Associates, she now takes the lessons from New York City’s streets to metropolises around the world.

The old order vs the new order

The crux of the problem, she explains, is that until recently, cities of the future were thought to be cities built for cars, not cities that encouraged human activity on the street.

Understanding what city-building used to mean is key to understanding how our cities are failing us today. Sadik-Khan offers a quick recap of New York City through recent decades. The historical lesson holds; in many ways, cities in every continent grew along a similar trajectory. Streets were designed to keep traffic moving, but not to support the life alongside it. The old order—which Sadik-Khan writes is typified by Robert Moses—took the auto-centric view that pedestrians, public transit, and bike riders were all hindrances in the path of cars.

Sadik-Khan calls for a more equitable and relevant city, one that prioritizes accessibility and convenience for everybody. Her generation of planners aims to transform roads, tunnels, and rail tracks—the legacy hardware of their predecessors—and repurpose them into public spaces to walk, bike, and play.

The strength of the book lies in just how effectively it dispels the misconceptions that most citizens, and indeed, urban planners, have held onto for decades. There are plenty of surprises in Streetfight.

Sadik-Khan shows that people’s ideas about safety can be obsolete. For instance, bike lanes don’t make accidents more likely, they make the streets safer. The statistics show that bike riders actually protect pedestrians by altering the behavior of drivers. Sadik-Khan states that bike ridership in New York City quadrupled from 2000-2012; and as the number of riders increases, so too does the safety of the street.

The assumption, for instance, that reducing lanes or closing them entirely creates gridlock, is entirely wrong. Sadik-Khan’s interventions in New York City —providing pedestrian space and creating fewer but more orderly lanes for vehicles—actually improved traffic. And she uses taxi GPS data to prove it….

In fact, Sadik-Khan makes the claim that the economic power of sustainable streets is probably one of the strongest arguments for implementing dramatic change. Cities need data—think retail rents, shop sales, travel speeds, vehicle counts—to defend their interventions, and then to measure their effectiveness. Yet, she writes, unfortunately there are few cities anywhere that have access to reliable numbers of this kind….

Sadik-Khan emphasizes time and again that change can happen quickly and affordably. She didn’t have to bulldoze neighborhoods, or build new bridges and highways to transform the transportation network of New York City. Planners can reorder a street without destroying a single building.

Streetfight is a handbook that prioritizes paint, planters, signs, and signals over mega-infrastructure projects. We are told that small-scale interventions can have transformative large-scale impacts. Sadik-Khan’s pocket parks, plazas, pedestrian-friendly road redesigns, and parking-protected bike lanes are all the proof we need. For planners in developing countries, this should serve as both guide and encouragement.

Innovation doesn’t need big dollars behind it to be effective, and most ideas are scalable for cities big and small. What it does need, however, is street smarts. Sadik-Khan makes it clear that the key to getting projects to move ahead is support on the ground, and enough political capital to pave the way. Planners everywhere need to encourage participation, invite ideas, and be more transparent about proposals, she writes. But they also need to be willing to put up a fight….(More)”

Access to Government Information in the United States: A Primer


Wendy Ginsberg and Michael Greene at Congressional Research Service: “No provision in the U.S. Constitution expressly establishes a procedure for public access to executive branch records or meetings. Congress, however, has legislated various public access laws. Among these laws are two records access statutes,

  • the Freedom of Information Act (FOIA; 5 U.S.C. §552), and
  • the Privacy Act (5 U.S.C. §552a),

and two meetings access statutes,

  •  the Federal Advisory Committee Act (FACA; 5 U.S.C. App.), and
  • the Government in the Sunshine Act (5 U.S.C. §552b).

These four laws provide the foundation for access to executive branch information in the American federal government. The records-access statutes provide the public with a variety of methods to examine how executive branch departments and agencies execute their missions. The meeting-access statutes provide the public the opportunity to participate in and inform the policy process. These four laws are also among the most used and most litigated federal access laws.

While the four statutes provide the public with access to executive branch federal records and meetings, they do not apply to the legislative or judicial branches of the U.S. government. The American separation of powers model of government provides a collection of formal and informal methods that the branches can use to provide information to one another. Moreover, the separation of powers anticipates conflicts over the accessibility of information. These conflicts are neither unexpected nor necessarily destructive. Although there is considerable interbranch cooperation in the sharing of information and records, such conflicts over access may continue on occasion.

This report offers an introduction to the four access laws and provides citations to additional resources related to these statutes. This report includes statistics on the use of FOIA and FACA and on litigation related to FOIA. The 114th Congress may have an interest in overseeing the implementation of these laws or may consider amending the laws. In addition, this report provides some examples of the methods Congress, the President, and the courts have employed to provide or require the provision of information to one another. This report is a primer on information access in the U.S. federal government and provides a list of resources related to transparency, secrecy, access, and nondisclosure….(More)”

States’ using iwaspoisoned.com for outbreak alerts


Dan Flynn at Food Safety News: “The crowdsourcing site iwaspoisoned.com has collected thousands of reports of foodborne illnesses from individuals across the United States since 2009 and is expanding with a custom alert service for state health departments.

“There are now 26 states signed up, allowing government (health) officials and epidemiologists to receive real time, customized alerts for reported foodborne illness incidents,” said iwaspoisoned.com founder Patrick Quade.
Quade said he wanted to make iwaspoisoned.com data more accessible to health departments and experts in each state.

“This real time information provides a wider range of information data to help local agencies better manage food illness outbreaks,” he said. “It also supplements existing reporting channels and serves to corroborate their own reporting systems.”

The Florida Department of Health, Food and Waterborne Disease Program (FWDP) began receiving iwaspoisoned.com alerts beginning in December 2015.

“The FWDP has had an online complaint form for individuals to report food and waterborne illnesses,” a spokesman said. “However, the program has been looking for ways to expand their reach to ensure they are investigating all incidents. Partnering with iwaspoisoned.com was a logical choice for this expansion.”…

Quade established iwaspoisoned.com in New York City seven years ago to give people a place to report their experiences of being sickened by restaurant food. It gives such people a place to report the restaurants, locations, symptoms and other details and permits others to comment on the report….

The crowdsourcing site has played an increasing role in recent nationally known outbreaks, including those associated with Chipotle Mexican Grill in the last half of 2015. For example, CBS News in Los Angeles first reported on the Simi Valley, Calif., norovirus outbreak after noticing that about a dozen Chipotle customers had logged their illness reports on iwaspoisoned.com.

Eventually, health officials confirmed at least 234 norovirus illnesses associated with a Chipotle location in Simi Valley…(More)”

It’s not big data that discriminates – it’s the people that use it


 in the Conversation: “Data can’t be racist or sexist, but the way it is used can help reinforce discrimination. The internet means more data is collected about us than ever before and it is used to make automatic decisions that can hugely affect our lives, from our credit scores to our employment opportunities.

If that data reflects unfair social biases against sensitive attributes, such as our race or gender, the conclusions drawn from that data might also be based on those biases.

But this era of “big data” doesn’t need to to entrench inequality in this way. If we build smarter algorithms to analyse our information and ensure we’re aware of how discrimination and injustice may be at work, we can actually use big data to counter our human prejudices.

This kind of problem can arise when computer models are used to make predictions in areas such as insurance, financial loans and policing. If members of a certain racial group have historically been more likely to default on their loans, or been more likely to be convicted of a crime, then the model can deem these people more risky. That doesn’t necessarily mean that these people actually engage in more criminal behaviour or are worse at managing their money. They may just be disproportionately targeted by police and sub-prime mortgage salesmen.

Excluding sensitive attributes

Data scientist Cathy O’Neil has written about her experience of developing models for homeless services in New York City. The models were used to predict how long homeless clients would be in the system and to match them with appropriate services. She argues that including race in the analysis would have been unethical.

If the data showed white clients were more likely to find a job than black ones, the argument goes, then staff might focus their limited resources on those white clients that would more likely have a positive outcome. While sociological research has unveiled the ways that racial disparities in homelessness and unemployment are the result of unjust discrimination, algorithms can’t tell the difference between just and unjust patterns. And so datasets should exclude characteristics that may be used to reinforce the bias, such as race.

But this simple response isn’t necessarily the answer. For one thing, machine learning algorithms can often infer sensitive attributes from a combination of other, non-sensitive facts. People of a particular race may be more likely to live in a certain area, for example. So excluding those attributes may not be enough to remove the bias….

An enlightened service provider might, upon seeing the results of the analysis, investigate whether and how racism is a barrier to their black clients getting hired. Equipped with this knowledge they could begin to do something about it. For instance, they could ensure that local employers’ hiring practices are fair and provide additional help to those applicants more likely to face discrimination. The moral responsibility lies with those responsible for interpreting and acting on the model, not the model itself.

So the argument that sensitive attributes should be stripped from the datasets we use to train predictive models is too simple. Of course, collecting sensitive data should be carefully regulated because it can easily be misused. But misuse is not inevitable, and in some cases, collecting sensitive attributes could prove absolutely essential in uncovering, predicting, and correcting unjust discrimination. For example, in the case of homeless services discussed above, the city would need to collect data on ethnicity in order to discover potential biases in employment practices….(More)

A new data viz tool shows what stories are being undercovered in countries around the world


Jospeh Lichterman at NiemanLab: “It’s a common lament: Though the Internet provides us access to a nearly unlimited number of sources for news, most of us rarely venture beyond the same few sources or topics. And as news consumption shifts to our phones, people are using even fewer sources: On average, consumers access 1.52 trusted news sources on their phones, according to the 2015 Reuters Digital News Report, which studied news consumption across several countries.

To try and diversify people’s perspectives on the news, Jigsaw — the techincubator, formerly known as Google Ideas, that’s run by Google’s parentcompany Alphabet — this week launched Unfiltered.News, an experimentalsite that uses Google News data to show users what topics are beingunderreported or are popular in regions around the world.

Screen Shot 2016-03-18 at 11.45.09 AM

Unfiltered.News’ main data visualization shows which topics are most reported in countries around the world. A column on the right side of the page highlights stories that are being reported widely elsewhere in the world, but aren’t in the top 100 stories on Google News in the selected country. In the United States yesterday, five of the top 10 underreported topics, unsurprisingly, dealt with soccer. In China, Barack Obama was the most undercovered topic….(More)”

To Make Cities More Efficient, Fix Procurement To Welcome Startups


Jay Nath and Jeremy M. Goldberg at the Aspen Journal of Ideas: “In 2014, an amazing thing happened in government: In just 16 weeks, a new system to help guide visually impaired travelers through San Francisco International Airport was developed, going from a rough idea to ready-to-go-status, through a city program that brings startups and agencies together. Yet two and half years later, a request for proposals to expand this ground-breaking, innovative technology is yet to be finalized.

For people in government, that’s an all-too-familiar scenario. While procurement serves an important role in ensuring that government is a responsible steward of taxpayer dollars, there’s tremendous opportunity to improve the way the public sector has traditionally bought goods and services. And the stakes are higher than simply dealing with red tape. By limiting the pool of partners to those who know how to work the system, taxpayers are missing out on low-cost, innovative solutions. Essentially, RFPs are a Do Not Enter sign for startups — the engine of innovation across nearly every industry except the public sector.

 Essentially, RFPs are a Do Not Enter sign for startups — the engine of innovation across nearly every industry except the public sector.

In San Francisco, under our Startup In Residence program, we’re experimenting with how to remove the friction associated with RFPs for both government staff and startups. For government staff, that means publishing an RFP in days, not months. For startups, it means responding to an RFP in hours not weeks.

So what did we learn from our experience with the airport? We combined 17 RFPs into one; utilized general “challenge statements” in place of highly detailed project specifications; leveraged modern technology; and created a simple guide to navigating the process. Here’s a look at how each of those innovations works:

The RFP bus: Today, most RFPs are like a single driver in a car — inefficient and resource-intensive. We should be looking at what might be thought of as mass-transit option, like a bus. By combining a number of RFPs for projects that have characteristics in common into a single procurement vehicle, we can spread the process costs over a number of RFPs.

Challenges, not prescriptions: Under the traditional procurement process, city staffers develop highly prescribed requirements that are often dozens of pages long, a practice that tends to favor existing approaches and established vendors. Shifting to brief challenge statements opens the door for residents, small businesses and entrepreneurs with new ideas. And it reduces the time required by government staff to develop an RFP from weeks or months to days.

 Technology that enables the process: This was critical to enabling San Francisco to combine 17 RFPs into one. Without the right technology, we wouldn’t be able to automatically route bidders’ proposals to the appropriate evaluation committees for online scoring or let bidders easily submit their responses. While this kind of procurement technology is not new, it’s use is still uncommon. That needs to change, and it’s more than a question of efficiency. When citizens and entrepreneurs have a painful experience interacting with government, they wonder how we can address the big challenges if we can’t get the small stuff right…(More)