Garbage In, Garbage Out… Or, How to Lie with Bad Data


Medium: For everyone who slept through Stats 101, Charles Wheelan’s Naked Statistics is a lifesaver. From batting averages and political polls to Schlitz ads and medical research, Wheelan “illustrates exactly why even the most reluctant mathophobe is well advised to achieve a personal understanding of the statistical underpinnings of life” (New York Times). What follows is adapted from the book, out now in paperback.
Behind every important study there are good data that made the analysis possible. And behind every bad study . . . well, read on. People often speak about “lying with statistics.” I would argue that some of the most egregious statistical mistakes involve lying with data; the statistical analysis is fine, but the data on which the calculations are performed are bogus or inappropriate. Here are some common examples of “garbage in, garbage out.”

Selection Bias

….Selection bias can be introduced in many other ways. A survey of consumers in an airport is going to be biased by the fact that people who fly are likely to be wealthier than the general public; a survey at a rest stop on Interstate 90 may have the opposite problem. Both surveys are likely to be biased by the fact that people who are willing to answer a survey in a public place are different from people who would prefer not to be bothered. If you ask 100 people in a public place to complete a short survey, and 60 are willing to answer your questions, those 60 are likely to be different in significant ways from the 40 who walked by without making eye contact.

Publication Bias

Positive findings are more likely to be published than negative findings, which can skew the results that we see. Suppose you have just conducted a rigorous, longitudinal study in which you find conclusively that playing video games does not prevent colon cancer. You’ve followed a representative sample of 100,000 Americans for twenty years; those participants who spend hours playing video games have roughly the same incidence of colon cancer as the participants who do not play video games at all. We’ll assume your methodology is impeccable. Which prestigious medical journal is going to publish your results?

Most things don’t prevent cancer.

None, for two reasons. First, there is no strong scientific reason to believe that playing video games has any impact on colon cancer, so it is not obvious why you were doing this study. Second, and more relevant here, the fact that something does not prevent cancer is not a particularly interesting finding. After all, most things don’t prevent cancer. Negative findings are not especially sexy, in medicine or elsewhere.
The net effect is to distort the research that we see, or do not see. Suppose that one of your graduate school classmates has conducted a different longitudinal study. She finds that people who spend a lot of time playing video games do have a lower incidence of colon cancer. Now that is interesting! That is exactly the kind of finding that would catch the attention of a medical journal, the popular press, bloggers, and video game makers (who would slap labels on their products extolling the health benefits of their products). It wouldn’t be long before Tiger Moms all over the country were “protecting” their children from cancer by snatching books out of their hands and forcing them to play video games instead.
Of course, one important recurring idea in statistics is that unusual things happen every once in a while, just as a matter of chance. If you conduct 100 studies, one of them is likely to turn up results that are pure nonsense—like a statistical association between playing video games and a lower incidence of colon cancer. Here is the problem: The 99 studies that find no link between video games and colon cancer will not get published, because they are not very interesting. The one study that does find a statistical link will make it into print and get loads of follow-on attention. The source of the bias stems not from the studies themselves but from the skewed information that actually reaches the public. Someone reading the scientific literature on video games and cancer would find only a single study, and that single study will suggest that playing video games can prevent cancer. In fact, 99 studies out of 100 would have found no such link.

Recall Bias

Memory is a fascinating thing—though not always a great source of good data. We have a natural human impulse to understand the present as a logical consequence of things that happened in the past—cause and effect. The problem is that our memories turn out to be “systematically fragile” when we are trying to explain some particularly good or bad outcome in the present. Consider a study looking at the relationship between diet and cancer. In 1993, a Harvard researcher compiled a data set comprising a group of women with breast cancer and an age-matched group of women who had not been diagnosed with cancer. Women in both groups were asked about their dietary habits earlier in life. The study produced clear results: The women with breast cancer were significantly more likely to have had diets that were high in fat when they were younger.
Ah, but this wasn’t actually a study of how diet affects the likelihood of getting cancer. This was a study of how getting cancer affects a woman’s memory of her diet earlier in life. All of the women in the study had completed a dietary survey years earlier, before any of them had been diagnosed with cancer. The striking finding was that women with breast cancer recalled a diet that was much higher in fat than what they actually consumed; the women with no cancer did not.

Women with breast cancer recalled a diet that was much higher in fat than what they actually consumed; the women with no cancer did not.

The New York Times Magazine described the insidious nature of this recall bias:

The diagnosis of breast cancer had not just changed a woman’s present and the future; it had altered her past. Women with breast cancer had (unconsciously) decided that a higher-fat diet was a likely predisposition for their disease and (unconsciously) recalled a high-fat diet. It was a pattern poignantly familiar to anyone who knows the history of this stigmatized illness: these women, like thousands of women before them, had searched their own memories for a cause and then summoned that cause into memory.

Recall bias is one reason that longitudinal studies are often preferred to cross-sectional studies. In a longitudinal study the data are collected contemporaneously. At age five, a participant can be asked about his attitudes toward school. Then, thirteen years later, we can revisit that same participant and determine whether he has dropped out of high school. In a cross-sectional study, in which all the data are collected at one point in time, we must ask an eighteen-year-old high school dropout how he or she felt about school at age five, which is inherently less reliable.

Survivorship Bias

Suppose a high school principal reports that test scores for a particular cohort of students has risen steadily for four years. The sophomore scores for this class were better than their freshman scores. The scores from junior year were better still, and the senior year scores were best of all. We’ll stipulate that there is no cheating going on, and not even any creative use of descriptive statistics. Every year this cohort of students has done better than it did the preceding year, by every possible measure: mean, median, percentage of students at grade level, and so on. Would you (a) nominate this school leader for “principal of the year” or (b) demand more data?

If you have a room of people with varying heights, forcing the short people to leave will raise the average height in the room, but it doesn’t make anyone taller.

I say “b.” I smell survivorship bias, which occurs when some or many of the observations are falling out of the sample, changing the composition of the observations that are left and therefore affecting the results of any analysis. Let’s suppose that our principal is truly awful. The students in his school are learning nothing; each year half of them drop out. Well, that could do very nice things for the school’s test scores—without any individual student testing better. If we make the reasonable assumption that the worst students (with the lowest test scores) are the most likely to drop out, then the average test scores of those students left behind will go up steadily as more and more students drop out. (If you have a room of people with varying heights, forcing the short people to leave will raise the average height in the room, but it doesn’t make anyone taller.)

Healthy User Bias

People who take vitamins regularly are likely to be healthy—because they are the kind of people who take vitamins regularly! Whether the vitamins have any impact is a separate issue. Consider the following thought experiment. Suppose public health officials promulgate a theory that all new parents should put their children to bed only in purple pajamas, because that helps stimulate brain development. Twenty years later, longitudinal research confirms that having worn purple pajamas as a child does have an overwhelmingly large positive association with success in life. We find, for example, that 98 percent of entering Harvard freshmen wore purple pajamas as children (and many still do) compared with only 3 percent of inmates in the Massachusetts state prison system.

The purple pajamas do not matter.

Of course, the purple pajamas do not matter; but having the kind of parents who put their children in purple pajamas does matter. Even when we try to control for factors like parental education, we are still going to be left with unobservable differences between those parents who obsess about putting their children in purple pajamas and those who don’t. As New York Times health writer Gary Taubes explains, “At its simplest, the problem is that people who faithfully engage in activities that are good for them—taking a drug as prescribed, for instance, or eating what they believe is a healthy diet—are fundamentally different from those who don’t.” This effect can potentially confound any study trying to evaluate the real effect of activities perceived to be healthful, such as exercising regularly or eating kale. We think we are comparing the health effects of two diets: kale versus no kale. In fact, if the treatment and control groups are not randomly assigned, we are comparing two diets that are being eaten by two different kinds of people. We have a treatment group that is different from the control group in two respects, rather than just one.

If statistics is detective work, then the data are the clues. My wife spent a year teaching high school students in rural New Hampshire. One of her students was arrested for breaking into a hardware store and stealing some tools. The police were able to crack the case because (1) it had just snowed and there were tracks in the snow leading from the hardware store to the student’s home; and (2) the stolen tools were found inside. Good clues help.
Like good data. But first you have to get good data, and that is a lot harder than it seems.

From funding agencies to scientific agency –


New paper on “Collective allocation of science funding as an alternative to peer review”: “Publicly funded research involves the distribution of a considerable amount of money. Funding agencies such as the US National Science Foundation (NSF), the US National Institutes of Health (NIH) and the European Research Council (ERC) give billions of dollars or euros of taxpayers’ money to individual researchers, research teams, universities, and research institutes each year. Taxpayers accordingly expect that governments and funding agencies will spend their money prudently and efficiently.

Investing money to the greatest effect is not a challenge unique to research funding agencies and there are many strategies and schemes to choose from. Nevertheless, most funders rely on a tried and tested method in line with the tradition of the scientific community: the peer review of individual proposals to identify the most promising projects for funding. This method has been considered the gold standard for assessing the scientific value of research projects essentially since the end of the Second World War.

However, there is mounting critique of the use of peer review to direct research funding. High on the list of complaints is the cost, both in terms of time and money. In 2012, for example, NSF convened more than 17,000 scientists to review 53,556 proposals [1]. Reviewers generally spend a considerable time and effort to assess and rate proposals of which only a minority can eventually get funded. Of course, such a high rejection rate is also frustrating for the applicants. Scientists spend an increasing amount of time writing and submitting grant proposals. Overall, the scientific community invests an extraordinary amount of time, energy, and effort into the writing and reviewing of research proposals, most of which end up not getting funded at all. This time would be better invested in conducting the research in the first place.

Peer review may also be subject to biases, inconsistencies, and oversights. The need for review panels to reach consensus may lead to sub‐optimal decisions owing to the inherently stochastic nature of the peer review process. Moreover, in a period where the money available to fund research is shrinking, reviewers may tend to “play it safe” and select proposals that have a high chance of producing results, rather than more challenging and ambitious projects. Additionally, the structuring of funding around calls‐for‐proposals to address specific topics might inhibit serendipitous discovery, as scientists work on problems for which funding happens to be available rather than trying to solve more challenging problems.

The scientific community holds peer review in high regard, but it may not actually be the best possible system for identifying and supporting promising science. Many proposals have been made to reform funding systems, ranging from incremental changes to peer review—including careful selection of reviewers [2] and post‐hoc normalization of reviews [3]—to more radical proposals such as opening up review to the entire online population [4] or removing human reviewers altogether by allocating funds through an objective performance measure [5].

We would like to add another alternative inspired by the mathematical models used to search the internet for relevant information: a highly decentralized funding model in which the wisdom of the entire scientific community is leveraged to determine a fair distribution of funding. It would still require human insight and decision‐making, but it would drastically reduce the overhead costs and may alleviate many of the issues and inefficiencies of the proposal submission and peer review system, such as bias, “playing it safe”, or reluctance to support curiosity‐driven research.

Our proposed system would require funding agencies to give all scientists within their remit an unconditional, equal amount of money each year. However, each scientist would then be required to pass on a fixed percentage of their previous year’s funding to other scientists whom they think would make best use of the money (Fig 1). Every year, then, scientists would receive a fixed basic grant from their funding agency combined with an elective amount of funding donated by their peers. As a result of each scientist having to distribute a given percentage of their previous year’s budget to other scientists, money would flow through the scientific community. Scientists who are generally anticipated to make the best use of funding will accumulate more.”

Open data movement faces fresh hurdles


SciDevNet: “The open-data community made great strides in 2013 towards increasing the reliability of and access to information, but more efforts are needed to increase its usability on the ground and the general capacity of those using it, experts say.
An international network of innovation hubs, the first extensive open data certification system and a data for development partnership are three initiatives launched last year by the fledgling Open Data Institute (ODI), a UK-based not-for-profit firm that champions the use of open data to aid social, economic and environmental development.
Before open data can be used effectively the biggest hurdles to be cleared are agreeing common formats for data sets and improving their trustworthiness and searchability, says the ODI’s chief statistician, Ulrich Atz.
“As it is so new, open data is often inconsistent in its format, making it difficult to reuse. We see a great need for standards and tools,” he tells SciDev.Net. Data that is standardised is of “incredible value” he says, because this makes it easier and faster to use and gives it a longer useable lifetime.
The ODI — which celebrated its first anniversary last month — is attempting to achieve this with a first-of-its-kind certification system that gives publishers and users important details about online data sets, including publishers’ names and contact information, the type of sharing licence, the quality of information and how long it will be available.
Certificates encourage businesses and governments to make use of open data by guaranteeing their quality and usability, and making them easier to find online, says Atz.
Finding more and better ways to apply open data will also be supported by a growing network of ODI ‘nodes’: centres that bring together companies, universities and NGOs to support open-data projects and communities….
Because lower-income countries often lack well-established data collection systems, they have greater freedom to rethink how data are collected and how they flow between governments and civil society, he says.
But there is still a long way to go. Open-data projects currently rely on governments and other providers sharing their data on online platforms, whereas in a truly effective system, information would be published in an open format from the start, says Davies.
Furthermore, even where advances are being made at a strategic level, open-data initiatives are still having only a modest impact in the real world, he says.
“Transferring [progress at a policy level] into availability of data on the ground and the capacity to use it is a lot tougher and slower,” Davies says.”

Open Development (Networked Innovations in International Development)


New book edited by Matthew L. Smith and Katherine M. A. Reilly (Foreword by Yochai Benkler) : “The emergence of open networked models made possible by digital technology has the potential to transform international development. Open network structures allow people to come together to share information, organize, and collaborate. Open development harnesses this power, to create new organizational forms and improve people’s lives; it is not only an agenda for research and practice but also a statement about how to approach international development. In this volume, experts explore a variety of applications of openness, addressing challenges as well as opportunities.
Open development requires new theoretical tools that focus on real world problems, consider a variety of solutions, and recognize the complexity of local contexts. After exploring the new theoretical terrain, the book describes a range of cases in which open models address such specific development issues as biotechnology research, improving education, and access to scholarly publications. Contributors then examine tensions between open models and existing structures, including struggles over privacy, intellectual property, and implementation. Finally, contributors offer broader conceptual perspectives, considering processes of social construction, knowledge management, and the role of individual intent in the development and outcomes of social models.”

Crowdsourcing forecasts on science and technology events and innovations


Kurzweil News: “George Mason University launched today, Jan. 10, the largest and most advanced science and technology prediction market in the world: SciCast.
The federally funded research project aims to improve the accuracy of science and technology forecasts. George Mason research assistant professor Charles Twardy is the principal investigator of the project.
SciCast crowdsources forecasts on science and technology events and innovations from aerospace to zoology.
For example, will Amazon use drones for commercial package delivery by the end of 2017? Today, SciCast estimates the chance at slightly more than 50 percent. If you think that is too low, you can estimate a higher chance. SciCast will use your estimate to adjust the combined forecast.
Forecasters can update their forecasts at any time; in the above example, perhaps after the Federal Aviation Administration (FAA) releases its new guidelines for drones. The continually updated and reshaped information helps both the public and private sectors better monitor developments in a variety of industries. SciCast is a real-time indicator of what participants think is going to happen in the future.
“Combinatorial” prediction market better than simple average


How SciCast works (Credit: George Mason University)
The idea is that collective wisdom from diverse, informed opinions can provide more accurate predictions than individual forecasters, a notion borne out by other crowdsourcing projects. Simply taking an average is almost always better than going with the “best” expert. But in a two-year test on geopolitical questions, the SciCast method did 40 percent better than the simple average.
SciCast uses the first general “combinatorial” prediction market. In a prediction market, forecasters spend points to adjust the group forecast. Significant changes “cost” more — but “pay” more if they turn out to be right. So better forecasters gain more points and therefore more influence, improving the accuracy of the system.
In a combinatorial market like SciCast, forecasts can influence each other. For example, forecasters might have linked cherry production to honeybee populations. Then, if forecasters increase the estimated percentage of honeybee colonies lost this winter, SciCast automatically reduces the estimated 2014 cherry production. This connectivity among questions makes SciCast more sophisticated than other prediction markets.
SciCast topics include agriculture, biology and medicine, chemistry, computational sciences, energy, engineered technologies, global change, information systems, mathematics, physics, science and technology business, social sciences, space sciences and transportation….

Crowdsourcing forecasts on science and technology events and innovations

George Mason University’s just-launched SciCast is largest and most advanced science and technology prediction market in the world
January 10, 2014


Example of SciCast crowdsourced forecast (credit: George Mason University)
George Mason University launched today, Jan. 10, the largest and most advanced science and technology prediction market in the world: SciCast.
The federally funded research project aims to improve the accuracy of science and technology forecasts. George Mason research assistant professor Charles Twardy is the principal investigator of the project.
SciCast crowdsources forecasts on science and technology events and innovations from aerospace to zoology.
For example, will Amazon use drones for commercial package delivery by the end of 2017? Today, SciCast estimates the chance at slightly more than 50 percent. If you think that is too low, you can estimate a higher chance. SciCast will use your estimate to adjust the combined forecast.
Forecasters can update their forecasts at any time; in the above example, perhaps after the Federal Aviation Administration (FAA) releases its new guidelines for drones. The continually updated and reshaped information helps both the public and private sectors better monitor developments in a variety of industries. SciCast is a real-time indicator of what participants think is going to happen in the future.
“Combinatorial” prediction market better than simple average


How SciCast works (Credit: George Mason University)
The idea is that collective wisdom from diverse, informed opinions can provide more accurate predictions than individual forecasters, a notion borne out by other crowdsourcing projects. Simply taking an average is almost always better than going with the “best” expert. But in a two-year test on geopolitical questions, the SciCast method did 40 percent better than the simple average.
SciCast uses the first general “combinatorial” prediction market. In a prediction market, forecasters spend points to adjust the group forecast. Significant changes “cost” more — but “pay” more if they turn out to be right. So better forecasters gain more points and therefore more influence, improving the accuracy of the system.
In a combinatorial market like SciCast, forecasts can influence each other. For example, forecasters might have linked cherry production to honeybee populations. Then, if forecasters increase the estimated percentage of honeybee colonies lost this winter, SciCast automatically reduces the estimated 2014 cherry production. This connectivity among questions makes SciCast more sophisticated than other prediction markets.
SciCast topics include agriculture, biology and medicine, chemistry, computational sciences, energy, engineered technologies, global change, information systems, mathematics, physics, science and technology business, social sciences, space sciences and transportation.
Seeking futurists to improve forecasts, pose questions


(Credit: George Mason University)
“With so many science and technology questions, there are many niches,” says Twardy, a researcher in the Center of Excellence in Command, Control, Communications, Computing and Intelligence (C4I), based in Mason’s Volgenau School of Engineering.
“We seek scientists, statisticians, engineers, entrepreneurs, policymakers, technical traders, and futurists of all stripes to improve our forecasts, link questions together and pose new questions.”
Forecasters discuss the questions, and that discussion can lead to new, related questions. For example, someone asked,Will Amazon deliver its first package using an unmanned aerial vehicle by Dec. 31, 2017?
An early forecaster suggested that this technology is likely to first be used in a mid-sized town with fewer obstructions or local regulatory issues. Another replied that Amazon is more likely to use robots to deliver packages within a short radius of a conventional delivery vehicle. A third offered information about an FAA report related to the subject.
Any forecaster could then write a question about upcoming FAA rulings, and link that question to the Amazon drones question. Forecasters could then adjust the strength of the link.
“George Mason University has succeeded in launching the world’s largest forecasting tournament for science and technology,” says Jason Matheny, program manager of Forecasting Science and Technology at the Intelligence Advanced Research Projects Activity, based in Washington, D.C. “SciCast can help the public and private sectors to better understand a range of scientific and technological trends.”
Collaborative but Competitive
More than 1,000 experts and enthusiasts from science and tech-related associations, universities and interest groups preregistered to participate in SciCast. The group is collaborative in spirit but also competitive. Participants are rewarded for accurate predictions by moving up on the site leaderboard, receiving more points to spend influencing subsequent prognostications. Participants can (and should) continually update their predictions as new information is presented.
SciCast has partnered with the American Association for the Advancement of Science, the Institute of Electrical and Electronics Engineers, and multiple other science and technology professional societies.
Mason members of the SciCast project team include Twardy; Kathryn Laskey, associate director for the C4I and a professor in the Department of Systems Engineering and Operations Research; associate professor of economics Robin Hanson; C4I research professor Tod Levitt; and C4I research assistant professors Anamaria Berea, Kenneth Olson and Wei Sun.
To register for SciCast, visit www.SciCast.org, or for more information, e-mail support@scicast.org. SciCast is open to anyone age 18 or older.”

New Book: Open Data Now


New book by Joel Gurin (The GovLab): “Open Data is the world’s greatest free resource–unprecedented access to thousands of databases–and it is one of the most revolutionary developments since the Information Age began. Combining two major trends–the exponential growth of digital data and the emerging culture of disclosure and transparency–Open Data gives you and your business full access to information that has never been available to the average person until now. Unlike most Big Data, Open Data is transparent, accessible, and reusable in ways that give it the power to transform business, government, and society.
Open Data Now is an essential guide to understanding all kinds of open databases–business, government, science, technology, retail, social media, and more–and using those resources to your best advantage. You’ll learn how to tap crowds for fast innovation, conduct research through open collaboration, and manage and market your business in a transparent marketplace.
Open Data is open for business–and the opportunities are as big and boundless as the Internet itself. This powerful, practical book shows you how to harness the power of Open Data in a variety of applications:

  • HOT STARTUPS: turn government data into profitable ventures
  • SAVVY MARKETING: understand how reputational data drives your brand
  • DATA-DRIVEN INVESTING: apply new tools for business analysis
  • CONSUMER IN FORMATION: connect with your customers using smart disclosure
  • GREEN BUSINESS: use data to bet on sustainable companies
  • FAST R&D: turn the online world into your research lab
  • NEW OPPORTUNITIES: explore open fields for new businesses

Whether you’re a marketing professional who wants to stay on top of what’s trending, a budding entrepreneur with a billion-dollar idea and limited resources, or a struggling business owner trying to stay competitive in a changing global market–or if you just want to understand the cutting edge of information technology–Open Data Now offers a wealth of big ideas, strategies, and techniques that wouldn’t have been possible before Open Data leveled the playing field.
The revolution is here and it’s now. It’s Open Data Now.”

Supporting open government in New Europe


Google Europe Blog: “The “New Europe” countries that joined the European Union over the past decade are moving ahead fast to use the Internet to improve transparency and open government. We recently partnered with Techsoup Global to support online projects driving forward good governance in Romania, the Czech Republic, and most recently, in Slovakia.
Techsoup Global, in partnership with the Slovak Center for Philanthropy, recently held an exciting social-startups awards ceremony Restart Slovakia 2013 in Bratislava. Slovakia’s Deputy Minister of Finance and Digital Champion Peter Pellegrini delivered keynote promoting Internet and Open Data and announced the winners of this year contest. Ambassadors from U.S., Israel and Romania and several distinguished Slovak NGOs also attended the ceremony.
Winning projects included:

  • Vzdy a vsade – Always and Everywhere – a volunteer portal offering online and anonymous psychological advice to internet users via chat.
  • Nemlcme.sk – a portal providing counsel for victims of sexual assaults.
  • Co robim – an educational online library of job careers advising young people how to choose their career paths and dream jobs.
  • Mapa zlocinu – an online map displaying various rates of criminality in different neighbourhoods.
  • Demagog.sk – a platform focused on analyzing public statements of politicians and releasing information about politicians and truthfulness of their speeches in a user-friendly format.”

The Failure and the Promise of Public Participation


Dr. Mark Funkhouser in Governing: “In a recent study entitled Making Public Participation Legal, Matt Leighninger cites a Knight Foundation report that found that attending a public meeting was more likely to reduce a person’s sense of efficacy and attachment to the community than to increase it. That sad fact is no surprise to the government officials who have to run — and endure — public meetings.
Every public official who has served for any length of time has horror stories about these forums. The usual suspects show up — the self-appointed activists (who sometimes seem to be just a little nuts) and the lobbyists. Regular folks have made the calculation that only in extreme circumstance, when they are really scared or angry, is attending a public hearing worth their time. And who can blame them when it seems clear that the game is rigged, the decisions already have been made, and they’ll probably have to sit through hours of blather before they get their three minutes at the microphone?
So much transparency and yet so little trust. Despite the fact that governments are pumping out more and more information to citizens, trust in government has edged lower and lower, pushed in part no doubt by the lingering economic hardships and government cutbacks resulting from the recession. Most public officials I talk to now take it as an article of faith that the public generally disrespects them and the governments they work for.
Clearly the relationship between citizens and their governments needs to be reframed. Fortunately, over the last couple of decades lots of techniques have been developed by advocates of deliberative democracy and citizen participation that provide both more meaningful engagement and better community outcomes. There are decision-making forums, “visioning” forums and facilitated group meetings, most of which feature some combination of large-group, small-group and online interactions.
But here’s the rub: Our legal framework doesn’t support these new methods of public participation. This fact is made clear in Making Public Participation Legal, which was compiled by a working group that included people from the National Civic League, the American Bar Association, the International City/County Management Association and a number of leading practitioners of public participation.
The requirements for public meetings in local governments are generally built into state statutes such as sunshine or open-meetings laws or other laws governing administrative procedures. These laws may require public hearings in certain circumstances and mandate that advance notice, along with an agenda, be posted for any meeting of an “official body” — from the state legislature to a subcommittee of the city council or an advisory board of some kind. And a “meeting” is one in which a quorum attends. So if three of a city council’s nine members sit on the finance committee and two of the committee members happen to show up at a public meeting, they may risk having violated the open-meetings law…”

Why the Nate Silvers of the World Don’t Know Everything


Felix Salmon in Wired: “This shift in US intelligence mirrors a definite pattern of the past 30 years, one that we can see across fields and institutions. It’s the rise of the quants—that is, the ascent to power of people whose native tongue is numbers and algorithms and systems rather than personal relationships or human intuition. Michael Lewis’ Moneyball vividly recounts how the quants took over baseball, as statistical analy­sis trumped traditional scouting and propelled the underfunded Oakland A’s to a division-winning 2002 season. More recently we’ve seen the rise of the quants in politics. Commentators who “trusted their gut” about Mitt Romney’s chances had their gut kicked by Nate Silver, the stats whiz who called the election days before­hand as a lock for Obama, down to the very last electoral vote in the very last state.
The reason the quants win is that they’re almost always right—at least at first. They find numerical patterns or invent ingenious algorithms that increase profits or solve problems in ways that no amount of subjective experience can match. But what happens after the quants win is not always the data-driven paradise that they and their boosters expected. The more a field is run by a system, the more that system creates incentives for everyone (employees, customers, competitors) to change their behavior in perverse ways—providing more of whatever the system is designed to measure and produce, whether that actually creates any value or not. It’s a problem that can’t be solved until the quants learn a little bit from the old-fashioned ways of thinking they’ve displaced.
No matter the discipline or industry, the rise of the quants tends to happen in four stages. Stage one is what you might call pre-disruption, and it’s generally best visible in hindsight. Think about quaint dating agencies in the days before the arrival of Match .com and all the other algorithm-powered online replacements. Or think about retail in the era before floor-space management analytics helped quantify exactly which goods ought to go where. For a live example, consider Hollywood, which, for all the money it spends on market research, is still run by a small group of lavishly compensated studio executives, all of whom are well aware that the first rule of Hollywood, as memorably summed up by screenwriter William Goldman, is “Nobody knows anything.” On its face, Hollywood is ripe for quantifi­cation—there’s a huge amount of data to be mined, considering that every movie and TV show can be classified along hundreds of different axes, from stars to genre to running time, and they can all be correlated to box office receipts and other measures of profitability.
Next comes stage two, disruption. In most industries, the rise of the quants is a recent phenomenon, but in the world of finance it began back in the 1980s. The unmistakable sign of this change was hard to miss: the point at which you started getting targeted and personalized offers for credit cards and other financial services based not on the relationship you had with your local bank manager but on what the bank’s algorithms deduced about your finances and creditworthiness. Pretty soon, when you went into a branch to inquire about a loan, all they could do was punch numbers into a computer and then give you the computer’s answer.
For a present-day example of disruption, think about politics. In the 2012 election, Obama’s old-fashioned campaign operatives didn’t disappear. But they gave money and freedom to a core group of technologists in Chicago—including Harper Reed, former CTO of the Chicago-based online retailer Threadless—and allowed them to make huge decisions about fund-raising and voter targeting. Whereas earlier campaigns had tried to target segments of the population defined by geography or demographic profile, Obama’s team made the campaign granular right down to the individual level. So if a mom in Cedar Rapids was on the fence about who to vote for, or whether to vote at all, then instead of buying yet another TV ad, the Obama campaign would message one of her Facebook friends and try the much more effective personal approach…
After disruption, though, there comes at least some version of stage three: over­shoot. The most common problem is that all these new systems—metrics, algo­rithms, automated decisionmaking processes—result in humans gaming the system in rational but often unpredictable ways. Sociologist Donald T. Campbell noted this dynamic back in the ’70s, when he articulated what’s come to be known as Campbell’s law: “The more any quantitative social indicator is used for social decision-making,” he wrote, “the more subject it will be to corruption pressures and the more apt it will be to distort and corrupt the social processes it is intended to monitor.”…
Policing is a good example, as explained by Harvard sociologist Peter Moskos in his book Cop in the Hood: My Year Policing Baltimore’s Eastern District. Most cops have a pretty good idea of what they should be doing, if their goal is public safety: reducing crime, locking up kingpins, confiscating drugs. It involves foot patrols, deep investigations, and building good relations with the community. But under statistically driven regimes, individual officers have almost no incentive to actually do that stuff. Instead, they’re all too often judged on results—specifically, arrests. (Not even convictions, just arrests: If a suspect throws away his drugs while fleeing police, the police will chase and arrest him just to get the arrest, even when they know there’s no chance of a conviction.)…
It’s increasingly clear that for smart organizations, living by numbers alone simply won’t work. That’s why they arrive at stage four: synthesis—the practice of marrying quantitative insights with old-fashioned subjective experience. Nate Silver himself has written thoughtfully about examples of this in his book, The Signal and the Noise. He cites baseball, which in the post-Moneyball era adopted a “fusion approach” that leans on both statistics and scouting. Silver credits it with delivering the Boston Red Sox’s first World Series title in 86 years. Or consider weather forecasting: The National Weather Service employs meteorologists who, understanding the dynamics of weather systems, can improve forecasts by as much as 25 percent compared with computers alone. A similar synthesis holds in eco­nomic forecasting: Adding human judgment to statistical methods makes results roughly 15 percent more accurate. And it’s even true in chess: While the best computers can now easily beat the best humans, they can in turn be beaten by humans aided by computers….
That’s what a good synthesis of big data and human intuition tends to look like. As long as the humans are in control, and understand what it is they’re controlling, we’re fine. It’s when they become slaves to the numbers that trouble breaks out. So let’s celebrate the value of disruption by data—but let’s not forget that data isn’t everything.

From Faith-Based to Evidence-Based: The Open Data 500 and Understanding How Open Data Helps the American Economy


Beth Noveck in Forbes: “Public funds have, after all, paid for their collection, and the law says that federal government data are not protected by copyright. By the end of 2009, the US and the UK had the only two open data one-stop websites where agencies could post and citizens could find open data. Now there are over 300 such portals for government data around the world with over 1 million available datasets. This kind of Open Data — including weather, safety and public health information as well as information about government spending — can serve the country by increasing government efficiency, shedding light on regulated industries, and driving innovation and job creation.

It’s becoming clear that open data has the potential to improve people’s lives. With huge advances in data science, we can take this data and turn it into tools that help people choose a safer hospital, pick a better place to live, improve the performance of their farm or business by having better climate models, and know more about the companies with whom they are doing business. Done right, people can even contribute data back, giving everyone a better understanding, for example of nuclear contamination in post-Fukushima Japan or incidences of price gouging in America’s inner cities.

The promise of open data is limitless. (see the GovLab index for stats on open data) But it’s important to back up our faith with real evidence of what works. Last September the GovLab began the Open Data 500 project, funded by the John S. and James L. Knight Foundation, to study the economic value of government Open Data extensively and rigorously.  A recent McKinsey study pegged the annual global value of Open Data (including free data from sources other than government), at $3 trillion a year or more. We’re digging in and talking to those companies that use Open Data as a key part of their business model. We want to understand whether and how open data is contributing to the creation of new jobs, the development of scientific and other innovations, and adding to the economy. We also want to know what government can do better to help industries that want high quality, reliable, up-to-date information that government can supply. Of those 1 million datasets, for example, 96% are not updated on a regular basis.

The GovLab just published an initial working list of 500 American companies that we believe to be using open government data extensively.  We’ve also posted in-depth profiles of 50 of them — a sample of the kind of information that will be available when the first annual Open Data 500 study is published in early 2014. We are also starting a similar study for the UK and Europe.

Even at this early stage, we are learning that Open Data is a valuable resource. As my colleague Joel Gurin, author of Open Data Now: the Secret to Hot Start-Ups, Smart Investing, Savvy Marketing and Fast Innovation, who directs the project, put it, “Open Data is a versatile and powerful economic driver in the U.S. for new and existing businesses around the country, in a variety of ways, and across many sectors. The diversity of these companies in the kinds of data they use, the way they use it, their locations, and their business models is one of the most striking things about our findings so far.” Companies are paradoxically building value-added businesses on top of public data that anyone can access for free….”

FULL article can be found here.