Civic Tech Forecast: 2014


Laura Dyson from Code for America: “Last year was a big year for civic technology and government innovation, and if last week’s Municipal Innovation discussion was any indication, 2014 promises to be even bigger. More than sixty civic innovators from both inside and outside of government gathered to hear three leading civic tech experts share their “Top Five” list of civic tech trends from 2013m, and predictions for what’s to come in 2014. From responsive web design to overcoming leadership change, guest speakers Luke Fretwell, Juan Pablo Velez, and Alissa Black covered both challenges and opportunities. And the audience had a few predictions of their own. Highlights included:
Mark Leech, Application Development Manager, City of Albuquerque: “Regionalization will allow smaller communities to participate and act as a force multiplier for them.”
Rebecca Williams, Policy Analyst, Sunlight Foundation: “Open data policy (law and implementation) will become more connected to traditional forms of governance, like public records and town hall meetings.”
Rick Dietz, IT Director, City of Bloomington, Ind.: “I think governments will need to collaborate directly more on open source development, particularly on enterprise scale software systems — not just civic apps.”
Kristina Ng, Office of Financial Empowerment, City and County of San Francisco: “I’m excited about the growing community of innovative government workers.”
Hillary Hartley, Presidential Innovation Fellow: “We’ll need to address sustainability and revenue opportunities. Consulting work can only go so far; we must figure out how to empower civic tech companies to actually make money.”
An informal poll of the audience showed that roughly 96 percent of the group was feeling optimistic about the coming year for civic innovation. What’s your civic tech forecast for 2014? Read on to hear what guest speakers Luke Fretwell, Juan Pablo Velez, and Alissa Black had to say, and then let us know how you’re feeling about 2014 by tweeting at @codeforamerica.”
 

Visual Insights: A Practical Guide to Making Sense of Data


New book by Katy Börner and David E. Polley: “In the age of Big Data, the tools of information visualization offer us a macroscope to help us make sense of the avalanche of data available on every subject. This book offers a gentle introduction to the design of insightful information visualizations. It is the only book on the subject that teaches nonprogrammers how to use open code and open data to design insightful visualizations. Readers will learn to apply advanced data mining and visualization techniques to make sense of temporal, geospatial, topical, and network data.

The book, developed for use in an information visualization MOOC, covers data analysis algorithms that enable extraction of patterns and trends in data, with chapters devoted to “when” (temporal data), “where” (geospatial data), “what” (topical data), and “with whom” (networks and trees); and to systems that drive research and development. Examples of projects undertaken for clients include an interactive visualization of the success of game player activity in World of Warcraft; a visualization of 311 number adoption that shows the diffusion of non-emergency calls in the United States; a return on investment study for two decades of HIV/AIDS research funding by NIAID; and a map showing the impact of the HiveNYC Learning Network.
Visual Insights will be an essential resource on basic information visualization techniques for scholars in many fields, students, designers, or anyone who works with data.”

Check out also the Information Visualization MOOC at http://ivmooc.cns.iu.edu/
 

Innovation in the Government Industry


in Huffington Post: “Government may be susceptible to the same forces that are currently changing many major industries. Software is eating government, too. Therefore government must use customer development to better serve customers else it risks becoming the next Blockbuster, Borders, or what the large publishing and financial services companies are at risk of becoming…
Government is currently one size fits all. In a free market, there is unblunding and multiple offerings for different segments of a market. For example there’s Natural Light, Budweiser, and Guinness. Competition forces companies to serve customers because if customers don’t like one offering they will simply choose a different one. If you don’t like your laundromat, restaurant, or job, you can simply go somewhere else. In contrast, switching governments is really hard.
Why Now
Government has been able to go a very long time without significant innovation. However now is the time for government to begin adapting because the forces changing nearly every industry may do the same to government. I will reiterate a few themes that Fred Wilson cited in a talk at LeWeb while talking about several different industries and add some more thoughts.
1. Organization: Technology driven networks replacing bureaucratic hierarchies
Bureaucratic hierarchies involve chains of command with lower levels of management making more detailed decisions and reporting back to higher levels of management. These systems often entail long communication lags, high costs, and principal/agent problems.
Technology driven networks are providing more efficient systems for organization and communication. For example, Amazon has changed the publishing industry by enabling anyone to publish content and enabling customers to decide what they want. Twitter has created a network around communication and news, enabling anyone who people want to hear to be heard.
2. Competition: Unbundling of product and service offerings
Technology advancements have made it cheaper and easier than ever before to produce a product and bring it to market. One result is that it’s become easier for an entrepreneur to provide one offering of a larger offering as a standalone offering. It provides customers with the option to buy what they want without having to pay more for stuff they don’t want. In addition, the offerings can be improved because producers are completely focused on that specific offering. For example, we used to buy one newspaper and get world, local, sports, etc. Now it’s all from different sources.
Bundling exists because it was more efficient than attempting to contract in the market for every tiny service. However some of the technology driven networks (as described above) are helping markets become more efficient and giving customers more customizable buying options. For example, you can buy a half hour of education, or borrow money from a peer.
We’re starting to see some of the governments offerings begin to be unbundled. For example, Uber and Hyperloop are providing transportation. A neighborhood in Oakland crowdfunded private security.
3. Finance: Lower payment transaction fees and crowdfunding
Innovation in payments, including Bitcoin, has made it cheaper and easier than ever to transfer money. It’s as easy as sending an email, clicking a hyperlink, or scanning a QR code. In addition, Bitcoin is not controlled by any regulators or intermediaries like the government, credit card companies, or even PayPal.
Crowdfunding enables the collective efforts of individuals to connect and pool their money to back initiatives, make purchases, or fund new projects. A school in Houston crowdfunded some exercise equipment instead of using government funding.
4. Communication: We are all connected and graphed
Mobile devices have become nearly as powerful as desktops or laptops. There are many things we can do with our phone that we can’t do on our desktop/laptop. For example, smartphones have sensors, are location aware, can be carried with us at all times, and are cheaper than desktops or laptops. These factors have lead to mass adoption of mobile devices across the world, including in countries with high poverty where people could not previously afford a desktop or laptop. Mobile is making innovative offerings like Uber and mobile payments possible.
Platforms like Facebook and Twitter provide everyone with access to millions of people. In addition, companies like Klout and Quora are measuring our reputation and social graph improving our ability to transact with each other. For example, when market participants trust one another (through the vehicle of a reputation system) many transactions that wouldn’t otherwise happen can now happen.This illustrated in the rise in popularity of collaborative consumption platforms and peer to peer marketplaces.
Serving Customers
The current government duopoly inhibits us from selecting the government that we want as well as from receiving the best possible service because of lack of incentive. However the technologies described above are making it possible to get services previously provided by the government through more efficient and effective means. They’re enabling a more free market for government services….
If government were to take the customer development route, it could try things like unbundling (see above) so that people could opt for the specific solutions they desire. Given the US government’s current balance sheet, it may actually need to start relying on other providers.
It could also rely more on “economic feedback” to inform its actions. Currently economic feedback is given through voting. Most people vote once every two or four years and then hope they get what they “paid” for. Can you imagine paying for a college without knowing which one you would be going to, know what they would be providing, or being able to request a refund or switch colleges? With more economic incentive, services would need to improve. For example, if there was a free market for roads, people would pay for and use the roads that were most safe.”

Selected Readings on Big Data


The Living Library’s Selected Readings series seeks to build a knowledge base on innovative approaches for improving the effectiveness and legitimacy of governance. This curated and annotated collection of recommended works on the topic of big data was originally published in 2014.

Big Data refers to the wide-scale collection, aggregation, storage, analysis and use of data. Government is increasingly in control of a massive amount of raw data that, when analyzed and put to use, can lead to new insights on everything from public opinion to environmental concerns. The burgeoning literature on Big Data argues that it generates value by: creating transparency; enabling experimentation to discover needs, expose variability, and improve performance; segmenting populations to customize actions; replacing/supporting human decision making with automated algorithms; and innovating new business models, products and services. The insights drawn from data analysis can also be visualized in a manner that passes along relevant information, even to those without the tech savvy to understand the data on its own terms (see The GovLab Selected Readings on Data Visualization).

Selected Reading List (in alphabetical order)

Annotated Selected Reading List (in alphabetical order)

Australian Government Information Management Office. The Australian Public Service Big Data Strategy: Improved Understanding through Enhanced Data-analytics Capability Strategy Report. August 2013. http://bit.ly/17hs2xY.

  • This Big Data Strategy produced for Australian Government senior executives with responsibility for delivering services and developing policy is aimed at ingraining in government officials that the key to increasing the value of big data held by government is the effective use of analytics. Essentially, “the value of big data lies in [our] ability to extract insights and make better decisions.”
  • This positions big data as a national asset that can be used to “streamline service delivery, create opportunities for innovation, identify new service and policy approaches as well as supporting the effective delivery of existing programs across a broad range of government operations.”

Bollier, David. The Promise and Peril of Big Data. The Aspen Institute, Communications and Society Program, 2010. http://bit.ly/1a3hBIA.

  • This report captures insights from the 2009 Roundtable exploring uses of Big Data within a number of important consumer behavior and policy implication contexts.
  • The report concludes that, “Big Data presents many exciting opportunities to improve modern society. There are incalculable opportunities to make scientific research more productive, and to accelerate discovery and innovation. People can use new tools to help improve their health and well-being, and medical care can be made more efficient and effective. Government, too, has a great stake in using large databases to improve the delivery of government services and to monitor for threats to national security.
  • However, “Big Data also presents many formidable challenges to government and citizens precisely because data technologies are becoming so pervasive, intrusive and difficult to understand. How shall society protect itself against those who would misuse or abuse large databases? What new regulatory systems, private-law innovations or social practices will be capable of controlling anti-social behaviors–and how should we even define what is socially and legally acceptable when the practices enabled by Big Data are so novel and often arcane?”

Boyd, Danah and Kate Crawford. “Six Provocations for Big Data.” A Decade in Internet Time: Symposium on the Dynamics of the Internet and Society. September 2011http://bit.ly/1jJstmz.

  • In this paper, Boyd and Crawford raise challenges to unchecked assumptions and biases regarding big data. The paper makes a number of assertions about the “computational culture” of big data and pushes back against those who consider big data to be a panacea.
  • The authors’ provocations for big data are:
    • Automating Research Changes the Definition of Knowledge
    • Claims to Objectivity and Accuracy are Misleading
    • Big Data is not always Better Data
    • Not all Data is Equivalent
    • Just Because it is accessible doesn’t make it ethical
    • Limited Access to Big Data creates New Digital Divide

The Economist Intelligence Unit. Big Data and the Democratisation of Decisions. October 2012. http://bit.ly/17MpH8L.

  • This report from the Economist Intelligence Unit focuses on the positive impact of big data adoption in the private sector, but its insights can also be applied to the use of big data in governance.
  • The report argues that innovation can be spurred by democratizing access to data, allowing a diversity of stakeholders to “tap data, draw lessons and make business decisions,” which in turn helps companies and institutions respond to new trends and intelligence at varying levels of decision-making power.

Manyika, James, Michael Chui, Brad Brown, Jacques Bughin, Richard Dobbs, Charles Roxburgh, and Angela Hung Byers. Big Data: The Next Frontier for Innovation, Competition, and Productivity.  McKinsey & Company. May 2011. http://bit.ly/18Q5CSl.

  • This report argues that big data “will become a key basis of competition, underpinning new waves of productivity growth, innovation, and consumer surplus, and that “leaders in every sector will have to grapple with the implications of big data.” 
  • The report offers five broad ways in which using big data can create value:
    • First, big data can unlock significant value by making information transparent and usable at much higher frequency.
    • Second, as organizations create and store more transactional data in digital form, they can collect more accurate and detailed performance information on everything from product inventories to sick days, and therefore expose variability and boost performance.
    • Third, big data allows ever-narrower segmentation of customers and therefore much more precisely tailored products or services.
    • Fourth, big sophisticated analytics can substantially improve decision-making.
    • Finally, big data can be used to improve the development of the next generation of products and services.

The Partnership for Public Service and the IBM Center for The Business of Government. “From Data to Decisions II: Building an Analytics Culture.” October 17, 2012. https://bit.ly/2EbBTMg.

  • This report discusses strategies for better leveraging data analysis to aid decision-making. The authors argue that, “Organizations that are successful at launching or expanding analytics program…systematically examine their processes and activities to ensure that everything they do clearly connects to what they set out to achieve, and they use that examination to pinpoint weaknesses or areas for improvement.”
  • While the report features many strategies for government decisions-makers, the central recommendation is that, “leaders incorporate analytics as a way of doing business, making data-driven decisions transparent and a fundamental approach to day-to-day management. When an analytics culture is built openly, and the lessons are applied routinely and shared widely, an agency can embed valuable management practices in its DNA, to the mutual benet of the agency and the public it serves.”

TechAmerica Foundation’s Federal Big Data Commission. “Demystifying Big Data: A Practical Guide to Transforming the Business of Government.” 2013. http://bit.ly/1aalUrs.

  • This report presents key big data imperatives that government agencies must address, the challenges and the opportunities posed by the growing volume of data and the value Big Data can provide. The discussion touches on the value of big data to businesses and organizational mission, presents case study examples of big data applications, technical underpinnings and public policy applications.
  • The authors argue that new digital information, “effectively captured, managed and analyzed, has the power to change every industry including cyber security, healthcare, transportation, education, and the sciences.” To ensure that this opportunity is realized, the report proposes a detailed big data strategy framework with the following steps: define, assess, plan, execute and review.

World Economic Forum. “Big Data, Big Impact: New Possibilities for International Development.” 2012. http://bit.ly/17hrTKW.

  • This report examines the potential for channeling the “flood of data created every day by the interactions of billions of people using computers, GPS devices, cell phones, and medical devices” into “actionable information that can be used to identify needs, provide services, and predict and prevent crises for the benefit of low-income populations”
  • The report argues that, “To realise the mutual benefits of creating an environment for sharing mobile-generated data, all ecosystem actors must commit to active and open participation. Governments can take the lead in setting policy and legal frameworks that protect individuals and require contractors to make their data public. Development organisations can continue supporting governments and demonstrating both the public good and the business value that data philanthropy can deliver. And the private sector can move faster to create mechanisms for the sharing data that can benefit the public.”

New Open Data Tool Helps Countries Compare Progress on Education


World Bank Group: “The World Bank Group today launched a new open data tool that provides in-depth, comparative, and easily accessible data on education policies around the world. The Systems Approach for Better Education Results (SABER) web tool helps countries collect and analyze information on their education policies, benchmark themselves against other countries, and prioritize areas for reform, with the goal of ensuring that all children and youth go to school and learn….
To date, the Bank Group, through SABER, has analyzed more than 100 countries to guide more effective reforms and investments in education at all levels, from pre-primary to tertiary education and workforce development.
Through SABER, the Bank Group aims to improve education quality by supplying policymakers, civil society, school administrators, teachers, parents, and students with more, and more meaningful, data about key education policy areas, including early childhood development, student assessment, teachers, school autonomy and accountability, and workforce development, among others.
SABER helps countries improve their education systems in three ways:

  1. Providing new data on policies and institutions. SABER collects comparable country data on education policies and institutions that are publicly available at: http://worldbank.org/education/saber, allowing governments, researchers, and other stakeholders to measure and monitor progress.
  2. Benchmarking education policies and institutions. Each policy area is rated on a four-point scale, from “Latent” to “Emerging” to “Established” and “Advanced.” These ratings highlight a country’s areas of strength and weakness while promoting cross-country learning.
  3. Highlighting key policy choices. SABER data collection and analysis produce an objective snapshot of how well a country’s education system is performing in relation to global good practice. This helps highlight the most important policy choices to spur learning.”

Use big data and crowdsourcing to detect nuclear proliferation, says DSB


FierceGovernmentIT: “A changing set of counter-nuclear proliferation problems requires a paradigm shift in monitoring that should include big data analytics and crowdsourcing, says a report from the Defense Science Board.
Much has changed since the Cold War when it comes to ensuring that nuclear weapons are subject to international controls, meaning that monitoring in support of treaties covering declared capabilities should be only one part of overall U.S. monitoring efforts, says the board in a January report (.pdf).
There are challenges related to covert operations, such as testing calibrated to fall below detection thresholds, and non-traditional technologies that present ambiguous threat signatures. Knowledge about how to make nuclear weapons is widespread and in the hands of actors who will give the United States or its allies limited or no access….
The report recommends using a slew of technologies including radiation sensors, but also exploitation of digital sources of information.
“Data gathered from the cyber domain establishes a rich and exploitable source for determining activities of individuals, groups and organizations needed to participate in either the procurement or development of a nuclear device,” it says.
Big data analytics could be used to take advantage of the proliferation of potential data sources including commercial satellite imaging, social media and other online sources.
The report notes that the proliferation of readily available commercial satellite imagery has created concerns about the introduction of more noise than genuine signal. “On balance, however, it is the judgment from the task force that more information from remote sensing systems, both commercial and dedicated national assets, is better than less information,” it says.
In fact, the ready availability of commercial imagery should be an impetus of governmental ability to find weak signals “even within the most cluttered and noisy environments.”
Crowdsourcing also holds potential, although the report again notes that nuclear proliferation analysis by non-governmental entities “will constrain the ability of the United States to keep its options open in dealing with potential violations.” The distinction between gathering information and making political judgments “will erode.”
An effort by Georgetown University students (reported in the Washington Post in 2011) to use open source data analyzing the network of tunnels used in China to hide its missile and nuclear arsenal provides a proof-of-concept on how crowdsourcing can be used to augment limited analytical capacity, the report says – despite debate on the students’ work, which concluded that China’s arsenal could be many times larger than conventionally accepted…
For more:
download the DSB report, “Assessment of Nuclear Monitoring and Verification Technologies” (.pdf)
read the WaPo article on the Georgetown University crowdsourcing effort”

E-government and organisational transformation of government: Black box revisited?


New paper in Government Information Quarterly: “During the e-government era the role of technology in the transformation of public sector organisations has significantly increased, whereby the relationship between ICT and organisational change in the public sector has become the subject of increasingly intensive research over the last decade. However, an overview of the literature to date indicates that the impacts of e-government on the organisational transformation of administrative structures and processes are still relatively poorly understood and vaguely defined.

The main purpose of the paper is therefore the following: (1) to examine the interdependence of e-government development and organisational transformation in public sector organisations and propose a clearer explanation of ICT’s role as a driving force of organisational transformation in further e-government development; and (2) to specify the main characteristics of organisational transformation in the e-government era through the development of a new framework. This framework describes organisational transformation in two dimensions, i.e. the ‘depth’ and the ‘nature’ of changes, and specifies the key attributes related to the three typical organisational levels.”

Tech Policy Is Not A Religion


Opinion Piece by Robert Atkinson: “”Digital libertarians” and “digital technocrats” want us to believe their way is the truth and the light. It’s not that black and white. Manichaeism, an ancient religion, took a dualistic view of the world. It described the struggle between a good, spiritual world of light, and an evil, material world of darkness. Listening to tech policy debates, especially in America, one would presume that Manichaeism is alive and well.
On one side (light or dark, depending on your view) are the folks who embrace free markets, bottom-up processes, multi-stakeholderism, open-source systems, and crowdsourced innovations. On the other are those who embrace government intervention, top-down processes, additional regulation, proprietary systems, and expert-based innovations.
For the first group, whom I’ll call the digital libertarians, government is the problem, not the solution. Tech enables freedom, and statist actions can only limit it.
According to this camp, tech is moving so fast that government can’t hope to keep up — the only workable governance system is a nimble one based on multi-stakeholder processes, such as ICANN and W3C. With Web 2.0, everyone can be a contributor, and it is through the proliferation of multiple and disparate voices that we discover the truth. And because of the ability of communities of coders to add their contributions, the only viable tech systems are based on open-source models.
For the second group, the digital technocrats, the problem is the anarchic, lawless, corporate-dominated nature of the digital world. Tech is so disruptive, including to long-established norms and laws, it needs to be limited and shaped, and only the strong hand of the state can do that. Because of the influence of tech on all aspects of society, any legitimate governance process must stem from democratic institutions — not from a select group of insiders — and that can only happen with government oversight such as through the UN’s International Telecommunication Union.
According to this camp, because there are so many uninformed voices on the Internet spreading urban myths like wildfire, we need carefully vetted experts, whether in media or other organizations, to sort through the mass of information and provide expert, unbiased analysis. And because IT systems are so critical to the safety and well-functioning of  society, we need companies to build and profit from them through a closed-source model.
Of course, just as religious Manichaeism leads to distorted practices of faith, tech Manichaeism leads to distorted policy practices and views. Take Internet governance. The process of ensuring Internet governance and evolution is complex and rapidly changing. A strong case can be made for the multi-stakeholder process as the driving force.
But this situation doesn’t mean, as digital libertarians would assert, that governments should stay out of the Internet altogether. Governments are not, as digital libertarian John Perry Barlow arrogantly asserts, “weary giants of flesh and steel.” Governments can and do play legitimate roles in many Internet policy issues, from establishing cybersecurity guidelines to setting online sales tax policy to combatting spam and digital piracy to setting rules governing unfair and deceptive online marketing practices.
This assertion doesn’t mean governments always get things right. They don’t. But as the Information Technology and Innovation Foundation writes in its recent response to Barlow’s manifesto, to deny people the right to regulate Internet activity through their government officials ignores the significant contribution the government can play in promoting the continued development of the Internet and digital economy.
At the same time, the digital technocrats must understand that the digital world is different from the analog one, and that old rules, regulations, and governing structures simply don’t apply. When ITU Secretary General Hamadoun Toure argues that “at the behest of all the world’s nations, the UN must lead this effort” to manage the global Internet, and that “for big commercial interests, it’s about maximizing the bottom line,” he’s ignoring the critical role that tech companies and other non-government stakeholders play in the Internet ecosystem.
Because digital technology is such a vastly complex system, digital libertarians claim that their “light” approach is superior to the “dark,” controlling, technocratic approach. In fact, this very complexity requires that we base Internet policy on pragmatism, not religion.
Conversely, because technology is so important to opportunity and the functioning of societies, digital technocrats assert that only governments can maximize these benefits. In fact, its importance requires us to respect its complexity and the role of private sector innovators in driving digital progress.
In short, the belief that one or the other of these approaches is sufficient in itself to maximize tech innovation is misleading at best and damaging at worst.”

Garbage In, Garbage Out… Or, How to Lie with Bad Data


Medium: For everyone who slept through Stats 101, Charles Wheelan’s Naked Statistics is a lifesaver. From batting averages and political polls to Schlitz ads and medical research, Wheelan “illustrates exactly why even the most reluctant mathophobe is well advised to achieve a personal understanding of the statistical underpinnings of life” (New York Times). What follows is adapted from the book, out now in paperback.
Behind every important study there are good data that made the analysis possible. And behind every bad study . . . well, read on. People often speak about “lying with statistics.” I would argue that some of the most egregious statistical mistakes involve lying with data; the statistical analysis is fine, but the data on which the calculations are performed are bogus or inappropriate. Here are some common examples of “garbage in, garbage out.”

Selection Bias

….Selection bias can be introduced in many other ways. A survey of consumers in an airport is going to be biased by the fact that people who fly are likely to be wealthier than the general public; a survey at a rest stop on Interstate 90 may have the opposite problem. Both surveys are likely to be biased by the fact that people who are willing to answer a survey in a public place are different from people who would prefer not to be bothered. If you ask 100 people in a public place to complete a short survey, and 60 are willing to answer your questions, those 60 are likely to be different in significant ways from the 40 who walked by without making eye contact.

Publication Bias

Positive findings are more likely to be published than negative findings, which can skew the results that we see. Suppose you have just conducted a rigorous, longitudinal study in which you find conclusively that playing video games does not prevent colon cancer. You’ve followed a representative sample of 100,000 Americans for twenty years; those participants who spend hours playing video games have roughly the same incidence of colon cancer as the participants who do not play video games at all. We’ll assume your methodology is impeccable. Which prestigious medical journal is going to publish your results?

Most things don’t prevent cancer.

None, for two reasons. First, there is no strong scientific reason to believe that playing video games has any impact on colon cancer, so it is not obvious why you were doing this study. Second, and more relevant here, the fact that something does not prevent cancer is not a particularly interesting finding. After all, most things don’t prevent cancer. Negative findings are not especially sexy, in medicine or elsewhere.
The net effect is to distort the research that we see, or do not see. Suppose that one of your graduate school classmates has conducted a different longitudinal study. She finds that people who spend a lot of time playing video games do have a lower incidence of colon cancer. Now that is interesting! That is exactly the kind of finding that would catch the attention of a medical journal, the popular press, bloggers, and video game makers (who would slap labels on their products extolling the health benefits of their products). It wouldn’t be long before Tiger Moms all over the country were “protecting” their children from cancer by snatching books out of their hands and forcing them to play video games instead.
Of course, one important recurring idea in statistics is that unusual things happen every once in a while, just as a matter of chance. If you conduct 100 studies, one of them is likely to turn up results that are pure nonsense—like a statistical association between playing video games and a lower incidence of colon cancer. Here is the problem: The 99 studies that find no link between video games and colon cancer will not get published, because they are not very interesting. The one study that does find a statistical link will make it into print and get loads of follow-on attention. The source of the bias stems not from the studies themselves but from the skewed information that actually reaches the public. Someone reading the scientific literature on video games and cancer would find only a single study, and that single study will suggest that playing video games can prevent cancer. In fact, 99 studies out of 100 would have found no such link.

Recall Bias

Memory is a fascinating thing—though not always a great source of good data. We have a natural human impulse to understand the present as a logical consequence of things that happened in the past—cause and effect. The problem is that our memories turn out to be “systematically fragile” when we are trying to explain some particularly good or bad outcome in the present. Consider a study looking at the relationship between diet and cancer. In 1993, a Harvard researcher compiled a data set comprising a group of women with breast cancer and an age-matched group of women who had not been diagnosed with cancer. Women in both groups were asked about their dietary habits earlier in life. The study produced clear results: The women with breast cancer were significantly more likely to have had diets that were high in fat when they were younger.
Ah, but this wasn’t actually a study of how diet affects the likelihood of getting cancer. This was a study of how getting cancer affects a woman’s memory of her diet earlier in life. All of the women in the study had completed a dietary survey years earlier, before any of them had been diagnosed with cancer. The striking finding was that women with breast cancer recalled a diet that was much higher in fat than what they actually consumed; the women with no cancer did not.

Women with breast cancer recalled a diet that was much higher in fat than what they actually consumed; the women with no cancer did not.

The New York Times Magazine described the insidious nature of this recall bias:

The diagnosis of breast cancer had not just changed a woman’s present and the future; it had altered her past. Women with breast cancer had (unconsciously) decided that a higher-fat diet was a likely predisposition for their disease and (unconsciously) recalled a high-fat diet. It was a pattern poignantly familiar to anyone who knows the history of this stigmatized illness: these women, like thousands of women before them, had searched their own memories for a cause and then summoned that cause into memory.

Recall bias is one reason that longitudinal studies are often preferred to cross-sectional studies. In a longitudinal study the data are collected contemporaneously. At age five, a participant can be asked about his attitudes toward school. Then, thirteen years later, we can revisit that same participant and determine whether he has dropped out of high school. In a cross-sectional study, in which all the data are collected at one point in time, we must ask an eighteen-year-old high school dropout how he or she felt about school at age five, which is inherently less reliable.

Survivorship Bias

Suppose a high school principal reports that test scores for a particular cohort of students has risen steadily for four years. The sophomore scores for this class were better than their freshman scores. The scores from junior year were better still, and the senior year scores were best of all. We’ll stipulate that there is no cheating going on, and not even any creative use of descriptive statistics. Every year this cohort of students has done better than it did the preceding year, by every possible measure: mean, median, percentage of students at grade level, and so on. Would you (a) nominate this school leader for “principal of the year” or (b) demand more data?

If you have a room of people with varying heights, forcing the short people to leave will raise the average height in the room, but it doesn’t make anyone taller.

I say “b.” I smell survivorship bias, which occurs when some or many of the observations are falling out of the sample, changing the composition of the observations that are left and therefore affecting the results of any analysis. Let’s suppose that our principal is truly awful. The students in his school are learning nothing; each year half of them drop out. Well, that could do very nice things for the school’s test scores—without any individual student testing better. If we make the reasonable assumption that the worst students (with the lowest test scores) are the most likely to drop out, then the average test scores of those students left behind will go up steadily as more and more students drop out. (If you have a room of people with varying heights, forcing the short people to leave will raise the average height in the room, but it doesn’t make anyone taller.)

Healthy User Bias

People who take vitamins regularly are likely to be healthy—because they are the kind of people who take vitamins regularly! Whether the vitamins have any impact is a separate issue. Consider the following thought experiment. Suppose public health officials promulgate a theory that all new parents should put their children to bed only in purple pajamas, because that helps stimulate brain development. Twenty years later, longitudinal research confirms that having worn purple pajamas as a child does have an overwhelmingly large positive association with success in life. We find, for example, that 98 percent of entering Harvard freshmen wore purple pajamas as children (and many still do) compared with only 3 percent of inmates in the Massachusetts state prison system.

The purple pajamas do not matter.

Of course, the purple pajamas do not matter; but having the kind of parents who put their children in purple pajamas does matter. Even when we try to control for factors like parental education, we are still going to be left with unobservable differences between those parents who obsess about putting their children in purple pajamas and those who don’t. As New York Times health writer Gary Taubes explains, “At its simplest, the problem is that people who faithfully engage in activities that are good for them—taking a drug as prescribed, for instance, or eating what they believe is a healthy diet—are fundamentally different from those who don’t.” This effect can potentially confound any study trying to evaluate the real effect of activities perceived to be healthful, such as exercising regularly or eating kale. We think we are comparing the health effects of two diets: kale versus no kale. In fact, if the treatment and control groups are not randomly assigned, we are comparing two diets that are being eaten by two different kinds of people. We have a treatment group that is different from the control group in two respects, rather than just one.

If statistics is detective work, then the data are the clues. My wife spent a year teaching high school students in rural New Hampshire. One of her students was arrested for breaking into a hardware store and stealing some tools. The police were able to crack the case because (1) it had just snowed and there were tracks in the snow leading from the hardware store to the student’s home; and (2) the stolen tools were found inside. Good clues help.
Like good data. But first you have to get good data, and that is a lot harder than it seems.

Open data movement faces fresh hurdles


SciDevNet: “The open-data community made great strides in 2013 towards increasing the reliability of and access to information, but more efforts are needed to increase its usability on the ground and the general capacity of those using it, experts say.
An international network of innovation hubs, the first extensive open data certification system and a data for development partnership are three initiatives launched last year by the fledgling Open Data Institute (ODI), a UK-based not-for-profit firm that champions the use of open data to aid social, economic and environmental development.
Before open data can be used effectively the biggest hurdles to be cleared are agreeing common formats for data sets and improving their trustworthiness and searchability, says the ODI’s chief statistician, Ulrich Atz.
“As it is so new, open data is often inconsistent in its format, making it difficult to reuse. We see a great need for standards and tools,” he tells SciDev.Net. Data that is standardised is of “incredible value” he says, because this makes it easier and faster to use and gives it a longer useable lifetime.
The ODI — which celebrated its first anniversary last month — is attempting to achieve this with a first-of-its-kind certification system that gives publishers and users important details about online data sets, including publishers’ names and contact information, the type of sharing licence, the quality of information and how long it will be available.
Certificates encourage businesses and governments to make use of open data by guaranteeing their quality and usability, and making them easier to find online, says Atz.
Finding more and better ways to apply open data will also be supported by a growing network of ODI ‘nodes’: centres that bring together companies, universities and NGOs to support open-data projects and communities….
Because lower-income countries often lack well-established data collection systems, they have greater freedom to rethink how data are collected and how they flow between governments and civil society, he says.
But there is still a long way to go. Open-data projects currently rely on governments and other providers sharing their data on online platforms, whereas in a truly effective system, information would be published in an open format from the start, says Davies.
Furthermore, even where advances are being made at a strategic level, open-data initiatives are still having only a modest impact in the real world, he says.
“Transferring [progress at a policy level] into availability of data on the ground and the capacity to use it is a lot tougher and slower,” Davies says.”