The Economist: “But then comes the challenge of generating real insight into forecasting accuracy. How can one compare forecasting ability?
The only reliable method is to conduct a forecasting tournament in which independent judges ask all participants to make the same forecasts in the same timeframes. And forecasts must be expressed numerically, so there can be no hiding behind vague verbiage. Words like “may” or “possible” can mean anything from probabilities as low as 0.001% to as high as 60% or 70%. But 80% always and only means 80%.
In the late 1980s one of us (Philip Tetlock) launched such a tournament. It involved 284 economists, political scientists, intelligence analysts and journalists and collected almost 28,000 predictions. The results were startling. The average expert did only slightly better than random guessing. Even more disconcerting, experts with the most inflated views of their own batting averages tended to attract the most media attention. Their more self-effacing colleagues, the ones we should be heeding, often don’t get on to our radar screens.
That project proved to be a pilot for a far more ambitious tournament currently sponsored by the Intelligence Advanced Research Projects Activity (IARPA), part of the American intelligence world. Over 5,000 forecasters have made more than 1m forecasts on more than 250 questions, from euro-zone exits to the Syrian civil war. Results are pouring in and they are revealing. We can discover who has better batting averages, not take it on faith; discover which methods of training promote accuracy, not just track the latest gurus and fads; and discover methods of distilling the wisdom of the crowd.
The big surprise has been the support for the unabashedly elitist “super-forecaster” hypothesis. The top 2% of forecasters in Year 1 showed that there is more than luck at play. If it were just luck, the “supers” would regress to the mean: yesterday’s champs would be today’s chumps. But they actually got better. When we randomly assigned “supers” into elite teams, they blew the lid off IARPA’s performance goals. They beat the unweighted average (wisdom-of-overall-crowd) by 65%; beat the best algorithms of four competitor institutions by 35-60%; and beat two prediction markets by 20-35%.
Over to you
To avoid slipping back to business as usual—believing we know things that we don’t—more tournaments in more fields are needed, and more forecasters. So we invite you, our readers, to join the 2014-15 round of the IARPA tournament. Current questions include: Will America and the EU reach a trade deal? Will Turkey get a new constitution? Will talks on North Korea’s nuclear programme resume? To volunteer, go to the tournament’s website at www.goodjudgmentproject.com. We predict with 80% confidence that at least 70% of you will enjoy it—and we are 90% confident that at least 50% of you will beat our dart-throwing chimps.”
See also https://web.archive.org/web/2013/http://www.iarpa.gov/Programs/ia/ACE/ace.html
“Neighborhood Buzz is an experimental system that lets you find out what people in your neighborhood, and neighborhoods in cities around the country, are talking about on Twitter. When you select a neighborhood from a city map, Neighborhood Buzz displays the main topics that people in that neighborhood are discussing — politics, sports, food, etc. — and then lets you drill down to look at the individual tweets in those categories.
The system also lets you see, at a glance, how much people in different neighborhoods in a city are talking about a given topic through a “heat map” overlay on the city’s geographical map.
Neighborhood Buzz uses geo-located tweets as input. Only a small fraction of tweets currently have location tags, but the number is sufficient to provide tens or hundreds of tweets per neighborhood per day.
The topical categorizer that the system uses is statistical — which means that even though we show only the tweets we are most confident the system is categorizing correctly, it still sometimes makes mistakes. You can let us know when the system has incorrectly categorized a tweet, and eventually that will help us to improve the system.
Neighborhood Buzz was originally developed at Northwestern University Knight Lab in our joint projects class in technology and journalism, involving students and faculty from the Medill School of Journalism and the McCormick School of Engineering, Dept. of Electrical Engineering and Computer Science, at Northwestern. It was then re-architected and further developed at the Knight Lab.”
The Telegraph: “The BBC has signed Memoranda of Understanding (MoUs) with the Europeana Foundation, the Open Data Institute, the Open Knowledge Foundation and the Mozilla Foundation, supporting free and open internet technologies…
The agreements will enable closer collaboration between the BBC and each of the four organisations on a range of mutual interests, including the release of structured open data and the use of open standards in web development, according to the BBC.
One aim of the agreement is to give clear technical standards and models to organisations who want to work with the BBC, and give those using the internet a deeper understanding of the technologies involved.
The MoUs also bring together several existing areas of research and provide a framework to explore future opportunities. Through this and other initiatives, the BBC hopes to become a catalyst for open innovation by publishing clear technical standards, models, expertise and – where feasible – data.
The BBC has been publishing linked open data for some time, most notably as part of the /programmes service, where machine-readable information about the programme schedule is made available online.
It also helped to deliver the Olympics Data Service, which underpinned 10,490 athlete pages on the BBC sport website during the 2012 Olympics….
“The BBC has been at the forefront of technological innovation around broadcasting and online for many years delivering the benefits of new technologies to licence fee payers, offering new services and products to audiences around the world, and creating public value in the digital economy,” said James Purnell, BBC Director of Strategy and Digital.”
FastCoExist: “In case the recent Obamacare debacle didn’t make it clear enough, the government has some serious problems getting technology to work correctly. This is something that President Obama has recognized in the past. In July, he made this statement: “I’m going to be asking more people around the country–more inventors and entrepreneurs and visionaries–to sign up to serve. We’ve got to have the brightest minds to help solve our biggest challenges.”
In San Francisco, that request has been taken on by the newly minted Entrepreneur-in-Residence (EIR) program–the first ever government-run program that helps startups to develop technologies that can be used to deal with pressing government issues. It’s kind of like a government startup incubator. This week, the EIR program announced 11 finalists for the program, which received 200 applications from startups across the world. Three to five startups will ultimately be chosen for the opportunity….
The 11 finalists range from small startups with just a handful of people doing cutting-edge work to companies valued at over $1 billion. Some of the highlights:
- Arrive Labs, a company that crowdsources public transit data and combines it with algorithms and external conditions (like the weather) to predict congestion, and to offer riders faster alternatives.
- A startup called Regroup that offers group messaging through a number of channels, including email, text, Facebook, Twitter, and digital signs.
- Smart waste management company Compology, which is working on a wireless waste monitoring system to tell officials what’s inside city dumpsters and when they are full.
- Birdi, a startup developing smart air quality, carbon monoxide, and smoke detectors that send alerts to your smartphone. The company also has an open API so that developers can pull in public outdoor air quality data.
- Synthicity’s 3-D digital city simulation (think “real-life Simcity”), which is based on urban datasets. The simulation is geared towards transportation planners, urban designers, and others who rely on city data to make decisions…”
Chronicle of Philanthropy: “Six nonprofit projects that aim to combine multiple sets of data to help solve social problems have each won $100,000 grants from the Bill & Melinda Gates Foundation…The winners:
• Pushpa Aman Singh, who founded GuideStar India as an effort of the Civil Society Information Services India. GuideStar India is the most comprehensive database of India’s registered charities. It has profiles of more than 4,000 organizations, and Ms. Singh plans to expand that number and the types of information included.
• Development Initiatives, an international aid organization, to support its partnership with the Ugandan nonprofit Development Research and Training. Together, they are trying to help residents of two districts in Uganda identify a key problem the communities face and use existing data sets to build both online and offline tools to help tackle that challenge…
• H.V. Jagadish, at the University of Michigan, to develop a prototype that will merge sets of incompatible geographic data to make them comparable. Mr. Jagadish, a professor of electrical engineering and computer science, points to crime precincts and school districts as an example. “We want to understand the impact of education on crime, but the districts don’t quite overlap with the precincts,” he says. “This tool will address the lack of overlap.”
• Vijay Modi, at Columbia University, to work with government agencies and charities in Nigeria on a tool similar to Foursquare, the social network that allows people to share their location with friends. Mr. Modi, a mechanical-engineering professor and faculty member of the university’s Earth Institute, envisions a tool that will help people find important resources more easily…
• Gisli Olafsson and his team at NetHope, a network of aid organizations. The group is building a tool to help humanitarian charities share their data more widely and in real time—potentially saving more lives during disasters…
• Development Gateway, a nonprofit that assists international development charities with technology, and GroundTruth Initiative, a nonprofit that helps residents of communities learn mapping and media skills. The two groups want to give people living in the slums of Nairobi, Kenya, more detailed information about local schools…”
New paper by Thorhildur Jetzek, Michel Avital, and Niels Bjørn-Andersen: “A driving force for change in society is the trend towards Open Government Data (OGD). While the value generated by OGD has been widely discussed by public bodies and other stakeholders, little attention has been paid to this phenomenon in the academic literature. Hence, we developed a conceptual model portraying how data as a resource can be transformed to value. We show the causal relationships between four contextual, enabling factors, four types of value generation mechanisms and value. We use empirical data from 61 countries to test these relationships, using the PLS method. The results mostly support the hypothesized relationships. Our conclusion is that if openness is complemented with resource governance, capabilities in society and technical connectivity, use of OGD will stimulate the generation of economic and social value through four different archetypical mechanisms: Efficiency, Innovation, Transparency and Participation.”
Katherine Barrett and Richard Greene in GOVERNING: “The easier it is for us to find important information about cities, counties and states, the better we’re able to report on topics of interest to our readers. But transparency isn’t just about us. It can help citizen organizations, good government bodies, advocacy groups, the press at large and even the general public. What’s more, accessible information makes it easier for legislators and city council members to drill down to the facts, creating more capacity for informed decision-making.
To be sure, progress has been made on a number of transparency fronts, and we certainly appreciate the additional data we’re able to find easily each year. That said, from our personal experience and conversations with experts in the field, much of the talk about heightened transparency in government is more rhetoric than reality.
Take so-called “online spending transparency,” or Web-based checkbooks that offer a clear and simple way to see where tax dollars are going. All 50 states have them. Optimally users would get, according to the nonprofit U.S. Public Interest Research Group (PIRG), a host of “checkbook-level information about expenditures including those made through contracts, grants, tax credits and other discretionary spending.”
Sounds swell, and in fact, PIRG’s studies of the 50 states have revealed consistent improvement. Each year, the organization has raised the bar on its criteria for grading the states. Still, in its most recent work, five states were given an F: California, Hawaii, North Dakota, Wisconsin and Wyoming. According to Phineas Baxandall, senior analyst for U.S. PIRG, in lagging and failing states—the dozen that got D’s and F’s—“you’ll find PDFs instead of searchable, sortable databases; you’ll find much more partial information about departments, and they generally don’t integrate economic subsidies.”
When it comes to disclosures of any kind there’s a huge chunk of information that’s as transparent as a window with the blinds closed. This includes a host of entities that generally don’t get their cash through the general fund. Starting the list are affiliated not-for-profits set up to provide government services and often funded through so-called “corporate funds” or grants, as well as public-private partnerships, authorities and a variety of other quasi-governmental bodies.”
Paper by Acquisti, Alessandro and Fong, Christina: “Surveys of U.S. employers suggest that numerous firms seek information about job applicants online. However, little is known about how this information gathering influences employers’ hiring behavior. We present results from two complementary randomized experiments (a field experiment and an online experiment) on the impact of online information on U.S. firms’ hiring behavior. We manipulate candidates’ personal information that is protected under either federal laws or some state laws, and may be risky for employers to enquire about during interviews, but which may be inferred from applicants’ online social media profiles. In the field experiment, we test responses of over 4,000 U.S. employers to a Muslim candidate relative to a Christian candidate, and to a gay candidate relative to a straight candidate. We supplement the field experiment with a randomized, survey-based online experiment with over 1,000 subjects (including subjects with previous human resources experience) testing the effects of the manipulated online information on hypothetical hiring decisions and perceptions of employability. The results of the field experiment suggest that a minority of U.S. firms likely searched online for the candidates’ information. Hence, the overall effect of the experimental manipulations on interview invitations is small and not statistically significant. However, in the field experiment, we find evidence of discrimination linked to political party affiliation. Specifically, following the Gallup Organization’s segmentation of U.S. states by political ideology, we use results from the 2012 presidential election and find evidence of discrimination against the Muslim candidate compared to the Christian candidate among employers in more Romney-leaning states and counties. These results are robust to controlling for firm characteristics, state fixed effects, and a host of county-level variables. We find no evidence of discrimination against the gay candidate relative to the straight candidate. Results from the online experiment are consistent with those from the field experiment: we find more evidence of bias among subjects more likely to self-report more political conservative party affiliation. The online experiment’s results are also robust to controlling for demographic variables. Results from both experiments should be interpreted carefully. Because politically conservative states and counties in our field experiment, and more conservative party affiliation in our online experiment, are not randomly assigned, the result that discrimination is greater in more politically conservative areas and among more politically conservative online subjects should be interpreted as correlational, not causal.”
Tiago Peixoto: “…Within an ecosystem that combines transparency and participation, examining the relationship between the two becomes essential. More specifically, a clearer understanding of the interaction between open data and participatory institutions remains a frontier to be explored….
R&D for Data-Driven Participation
Coming up with clear hypotheses and testing them is essential if we are to move forward with the ecosystem that brings together open data, participation and accountability. Surely, many organizations working in the open government space are operating with limited resources, squeezing their budgets to keep their operational work going. In this sense, conducting experiments to test hypotheses may appear as a luxury that very few can afford.
Nevertheless, one of the opportunities provided by the use of technologies for civic behavior is that of potentially driving down the costs for experimentation. For instance, online and mobile experiments could play the role of tech-enabled (and affordable) randomized controlled trials, improving our understanding of how open data can be best used to spur collective action. Thinking of the ways in which technology can be used to conduct lowered costs experiments to shed light on behavioral and causal chains is still limited to a small number of people and organizations, and much work is needed on that front.
Yet, it is also important to acknowledge that experiments are not the only source of relevant knowledge. To stick with a simple example, in some cases even an online survey trying to figure out who is accessing data, what data they use, and how they use it may provide us with valuable knowledge about the interaction between open data and citizen action. In any case, however, it may be important that the actors working in that space agree upon a minimal framework that facilitates comparison and incremental learning: the field of technology for accountability desperately needs a more coordinated research agenda.
Citizen Data Platforms?
As more and more players engage in participatory initiatives, there is a significant amount of citizen-generated data being collected, which is important on its own. However, in a similar vein to government data, the potential of citizen data may be further unlocked if openly available to third parties who can learn from it and build upon it. In this respect, it might not be long before we realize the need to have adequate structures and platforms to host this wealth of data that – hopefully – will be increasingly generated around the world. This would entail that not only governments open up their data related to citizen engagement initiatives, but also that other actors working in that field – such as donors and NGOs – do the same. Such structures would also be the means by which lessons generated by experiments and other approaches are widely shared, bringing cumulative knowledge to the field.
However, as we think of future scenarios, we should not lose sight of current challenges and knowledge gaps when it comes to the relationship between citizen engagement and open data. Better disentangling the relationship between the two is the most immediate priority, and a long overdue topic in the open government conversation.”
New Scientist: “Freely available information has the power to make and save money and enhance our daily life, says Nigel Shadbolt of the Open Data Institute…
What kind of things do these start-ups do?
Our first success was with data analytics company Mastadon C, which used public information to look at doctors’ prescribing habits for cholesterol-lowering drugs. They found that by switching from brand names to generic drugs, doctors could save the NHS more than £200 million a year.
Have you looked at other public resources?
Another start-up, Placr, is unifying timetables and live departure and disruption information for UK bus, rail, underground, ferry and tram services. It uses feeds from many organisations to provide an app for travellers and services for local authorities. A recent review in London – where Transport for London has made lots of its data open – showed that millions of journeys are being altered to avoid disruptions on the basis of this information. Time savings alone add up to £58 million a year.
Is there a danger of creating more big companies that will turn into monopolies?
We want companies that use open data to make money, and they will try to defend their patches. But if we leave the data open, others can exploit it too. Nobody can own or monopolise the data. I think we can make more money and create more benefit by making data open, and I’m sure we will even dislodge a few monopolies along the way.”