How Wikipedia Data Is Revolutionizing Flu Forecasting


They say their model has the potential to transform flu forecasting from a black art to a modern science as well-founded as weather forecasting.
Flu takes between 3,000 and 49,000 lives each year in the U.S. so an accurate forecast can have a significant impact on the way society prepares for the epidemic. The current method of monitoring flu outbreaks is somewhat antiquated. It relies on a voluntary system in which public health officials report the percentage of patients they see each week with influenza-like illnesses. This is defined as the percentage of people with a temperature higher than 100 degrees, a cough and no other explanation other than flu.
These numbers give a sense of the incidence of flu at any instant but the accuracy is clearly limited. They do not, for example, account for people with flu who do not seek treatment or people with flu-like symptoms who seek treatment but do not have flu.
There is another significant problem. The network that reports this data is relatively slow. It takes about two weeks for the numbers to filter through the system so the data is always weeks old.
That’s why the CDC is interested in finding new ways to monitor the spread of flu in real time. Google, in particular, has used the number of searches for flu and flu-like symptoms to forecast flu in various parts of the world. That approach has had considerable success but also some puzzling failures. One problem, however, is that Google does not make its data freely available and this lack of transparency is a potential source of trouble for this kind of research.
So Hickmann and co have turned to Wikipedia. Their idea is that the variation in numbers of people accessing articles about flu is an indicator of the spread of the disease. And since Wikipedia makes this data freely available to any interested party, it is an entirely transparent source that is likely to be available for the foreseeable future….
Ref: arxiv.org/abs/1410.7716 : Forecasting the 2013–2014 Influenza Season using Wikipedia”

The future of intelligence is distributed – and so is the future of government


Craig Thomler at eGovAU: “…Now we can do much better. Rather than focusing on electing and appointing individual experts – the ‘nodes’ in our governance system, governments need to focus on the network that interconnects citizens, government, business, not-for-profits and other entities.

Rather than limiting decision making to a small core of elected officials (supported by appointed and self-nominated ‘experts’), we need to design decision-making systems which empower broad groups of citizens to self-inform and involve themselves at appropriate steps of decision-making processes.
This isn’t quite direct democracy – where the population weighs in on every issue, but it certainly is a few steps removed from the alienating ‘representative democracy’ that many countries use today.
What this model of governance allows for is far more agile and iterative policy debates, rapid testing and improvement of programs and managed distributed community support – where anyone in a community can offer to help others within a framework which values, supports and rewards their involvement, rather than looks at it with suspicion and places many barriers in the way.
Of course we need the mechanisms designed to support this model of government, and the notion that they will simply evolve out of our existing system is quite naive.
Our current governance structures are evolutionary – based on the principle that better approaches will beat out ineffective and inefficient ones. Both history and animal evolution have shown that inefficient organisms can survive for extremely long times, and can require radical environmental change (such as mass extinction events) for new forms to be successful.
On top of this the evolution of government is particularly slow as there’s far fewer connections between the 200-odd national governments in the world than between the 200+ Watson artificial intelligences in the world.
While every Watson learns what other Watsons learn rapidly, governments have stilted and formal mechanisms for connection that mean that it can take decades – or even longer – for them to recognise successes and failures in others.
In other words, while we have a diverse group of governments all attempting to solve many of the same basic problems, the network effect isn’t working as they are all too inward focused and have focused on developing expertise ‘nodes’ (individuals) rather than expert networks (connections).
This isn’t something that can be fixed by one, or even a group of ten or more governments – thereby leaving humanity in the position of having to repeat the same errors time and time again, approving the same drugs, testing the same welfare systems, trialing the same legal regimes, even when we have examples of their failures and successes we could be learning from.
So therefore the best solution – perhaps the only workable solution for the likely duration of human civilisation on this planet – is to do what some of our forefather did and design new forms of government in a planned way.
Rather than letting governments slowly and haphazardly evolve through trial and error, we should take a leaf out of the book of engineers, and place a concerted effort into designing governance systems that meet human needs.
These systems should involve and nurture strong networks, focusing on the connections rather than the nodes – allowing us to both leverage the full capabilities of society in its own betterment and to rapidly adjust settings when environments and needs change….”

Research Handbook On Transparency


New book edited by Padideh Ala’i and Robert G. Vaughn: ‘”Transparency” has multiple, contested meanings. This broad-ranging volume accepts that complexity and thoughtfully contrasts alternative views through conceptual pieces, country cases, and assessments of policies–such as freedom of information laws, whistleblower protections, financial disclosure, and participatory policymaking procedures.’
– Susan Rose-Ackerman, Yale University Law School, US
In the last two decades transparency has become a ubiquitous and stubbornly ambiguous term. Typically understood to promote rule of law, democratic participation, anti-corruption initiatives, human rights, and economic efficiency, transparency can also legitimate bureaucratic power, advance undemocratic forms of governance, and aid in global centralization of power. This path-breaking volume, comprising original contributions on a range of countries and environments, exposes the many faces of transparency by allowing readers to see the uncertainties, inconsistencies and surprises contained within the current conceptions and applications of the term….
The expert contributors identify the goals, purposes and ramifications of transparency while presenting both its advantages and shortcomings. Through this framework, they explore transparency from a number of international and comparative perspectives. Some chapters emphasize cultural and national aspects of the issue, with country-specific examples from China, Mexico, the US and the UK, while others focus on transparency within global organizations such as the World Bank and the WTO. A number of relevant legal considerations are also discussed, including freedom of information laws, financial disclosure of public officials and whistleblower protection…”

The New We the People Write API, and What It Means for You


White House Blog by Leigh Heyman: “The White House petitions platform, We the People, just became more accessible and open than ever before. We are very excited to announce the launch of the “write” version of the Petitions Application Programming Interface, or “API.”
Starting today, people can sign We the People petitions even when they’re not on WhiteHouse.gov. Now, users can also use third-party platforms, including other petitions services, or even their own websites or blogs. All of those signatures, once validated, will count towards a petition’s objective of meeting the 100,000-signature threshold needed for an official White House response.
We the People started with a simple goal: to give more Americans a way to reach their government. To date, the platform has been more successful than we could have imagined, with more than 16 million users creating and signing more than 360,000 petitions.
We launched our Write API beta test last year, and since then we’ve been hard at work, both internally and in collaboration with our beta test participants. Last spring, as part of the National Day of Civic Hacking, we hosted a hackathon right here at the White House, where our engineers spent a day sitting side-by-side with our beta testers to help get our code and theirs ready for the big day.
That big day has finally come.
Click here if you want to get started right away, or read on to learn more about the Petitions Write API….”

The government wants to study ‘social pollution’ on Twitter


in the Washington Post: “If you take to Twitter to express your views on a hot-button issue, does the government have an interest in deciding whether you are spreading “misinformation’’? If you tweet your support for a candidate in the November elections, should taxpayer money be used to monitor your speech and evaluate your “partisanship’’?

My guess is that most Americans would answer those questions with a resounding no. But the federal government seems to disagree. The National Science Foundation , a federal agency whose mission is to “promote the progress of science; to advance the national health, prosperity and welfare; and to secure the national defense,” is funding a project to collect and analyze your Twitter data.
The project is being developed by researchers at Indiana University, and its purported aim is to detect what they deem “social pollution” and to study what they call “social epidemics,” including how memes — ideas that spread throughout pop culture — propagate. What types of social pollution are they targeting? “Political smears,” so-called “astroturfing” and other forms of “misinformation.”
Named “Truthy,” after a term coined by TV host Stephen Colbert, the project claims to use a “sophisticated combination of text and data mining, social network analysis, and complex network models” to distinguish between memes that arise in an “organic manner” and those that are manipulated into being.

But there’s much more to the story. Focusing in particular on political speech, Truthy keeps track of which Twitter accounts are using hashtags such as #teaparty and #dems. It estimates users’ “partisanship.” It invites feedback on whether specific Twitter users, such as the Drudge Report, are “truthy” or “spamming.” And it evaluates whether accounts are expressing “positive” or “negative” sentiments toward other users or memes…”

Open data for open lands


at Radar: “President Obama’s well-publicized national open data policy (pdf) makes it clear that government data is a valuable public resource for which the government should be making efforts to maximize access and use. This policy was based on lessons from previous government open data success stories, such as weather data and GPS, which form the basis for countless commercial services that we take for granted today and that deliver enormous value to society. (You can see an impressive list of companies reliant on open government data via GovLab’s Open Data 500 project.)
Based on this open data policy, I’ve been encouraging entrepreneurs to invest their time and ingenuity to explore entrepreneurial opportunities based on government data. I’ve even invested (through O’Reilly AlphaTech Ventures) in one such start-up, Hipcamp, which provides user-friendly interfaces to making reservations at national and state parks.
A better system is sorely needed. The current reservation system, managed by the Active Network / Reserve America is clunky and almost unusable. Hipcamp changes all that, making it a breeze to reserve camping spots.
But now this is under threat. Active Network / Reserve America’s 10-year contract is up for renewal, and the Department of the Interior had promised an RFP for a new contract that conformed with the open data mandate. Ideally, that RFP would require an API so that independent companies could provide alternate interfaces, just like travel sites provide booking interfaces for air travel, hotels, and more. That explosion of consumer convenience should be happening for customers of our nation’s parks as well, don’t you think?…”

The Role Of Open Data In Choosing Neighborhood


PlaceILive Blog: “To what extent is it important to get familiar with our environment?
If we think about how the world surrounding us has changed throughout the years, it is not so unreasonable that, while walking to work, we might encounter some new little shops, restaurants, or gas stations we had never noticed before. Likewise, how many times did we wander about for hours just to find green spaces for a run? And the only one we noticed was even more polluted than other urban areas!
Citizens are not always properly informed about the evolution of the places they live in. And that is why it would be crucial for people to be constantly up-to-date with accurate information of the neighborhood they have chosen or are going to choose.
London is a neat evidence of how transparency in providing data is basic in order to succeed as a Smart City.
The GLA’s London Datastore, for instance, is a public platform of datasets revealing updated figures on the main services offered by the town, in addition to population’s lifestyle and environmental risks. These data are then made more easily accessible to the community through the London Dashboard.
The importance of dispensing free information can be also proved by the integration of maps, which constitute an efficient means of geolocation. Consulting a map where it’s easy to find all the services you need as close as possible can be significant in the search for a location.
Wheel 435
(source: Smart London Plan)
The Open Data Index, published by The Open Knowledge Foundation in 2013, is another useful tool for data retrieval: it showcases a rank of different countries in the world with scores based on openness and availability of data attributes such as transport timetables and national statistics.
Here it is possible to check UK Open Data Census and US City Open Data Census.
As it was stated, making open data available and easily findable online not only represented a success for US cities but favoured apps makers and civic hackers too. Lauren Reid, a spokesperson at Code for America, reported according to Government Technology: “The more data we have, the better picture we have of the open data landscape.”
That is, on the whole, what Place I Live puts the biggest effort into: fostering a new awareness of the environment by providing free information, in order to support citizens willing to choose the best place they can live.
The outcome is soon explained. The website’s homepage offers visitors the chance to type address of their interest, displaying an overview of neighborhood parameters’ evaluation and a Life Quality Index calculated for every point on the map.
The research of the nearest medical institutions, schools or ATMs thus gets immediate and clear, as well as the survey about community’s generic information. Moreover, data’s reliability and accessibility are constantly examined by a strong team of professionals with high competence in data analysis, mapping, IT architecture and global markets.
For the moment the company’s work is focused on London, Berlin, Chicago, San Francisco and New York, while higher goals to reach include more than 200 cities.
US Open Data Census finally saw San Francisco’s highest score achievement as a proof of the city’s labour in putting technological expertise at everyone’s disposal, along with the task of fulfilling users’ needs through meticulous selections of datasets. This challenge seems to be successfully overcome by San Francisco’s new investment, partnering with the University of Chicago, in a data analytics dashboard on sustainability performance statistics named Sustainable Systems Framework, which is expected to be released in beta version by the the end of 2015’s first quarter.
 
Another remarkable collaboration in Open Data’s spread comes from the Bartlett Centre for Advanced Spatial Analysis (CASA) of the University College London (UCL); Oliver O’Brien, researcher at UCL Department of Geography and software developer at the CASA, is indeed one of the contributors to this cause.
Among his products, an interesting accomplishment is London’s CityDashboard, a real-time reports’ control panel in terms of spatial data. The web page also allows to visualize the whole data translated into a simplified map and to look at other UK cities’ dashboards.
Plus, his Bike Share Map is a live global view to bicycle sharing systems in over a hundred towns around the world, since bike sharing has recently drawn a greater public attention as an original form of transportation, in Europe and China above all….”

Why Are Political Scientists Studying Ice Bucket Challenges?


at the National Journal: “Who is more civically engaged—the person who votes in every election or the nonvoter who volunteers as a crossing guard at the local elementary school? What about the person who comments on an online news story? Does it count more if he posts the article on his Facebook page and urges his friends to act? What about the retired couple who takes care of the next-door neighbor’s kid after school until her single mom gets home from work?
The concept of civic engagement is mutating so fast that researchers are having a hard time keeping up with it. The Bureau of Labor Statistics has been collecting data on volunteering—defined as doing unpaid work through or for an organization—only since 2002. But even in that relatively short time period, that definition of “volunteering” has become far too limiting to cover the vast array of civic activity sprouting up online and in communities across the country.

  Infographic

Here’s just one example: Based on the BLS data alone, you would think that whites who graduated from college are far more likely to volunteer than African Americans or Hispanics with only high school degrees. But the the BLS’s data doesn’t take into account the retired couple mentioned above, who, based on cultural norms, is more likely to be black or Hispanic. It doesn’t capture the young adults in poor neighborhoods who tell those researchers that they consider being a role model to younger kids their most important contribution to their communities. Researchers say those informal forms of altruism are more common among minority communities, while BLS-type “volunteering”—say, being a tutor to a disadvantaged child—is more common among middle-class whites. Moreover, the BLS’s data only scratches the surface of political involvement…”

Training Students to Extract Value from Big Data


New report by the National Research Council: “As the availability of high-throughput data-collection technologies, such as information-sensing mobile devices, remote sensing, internet log records, and wireless sensor networks has grown, science, engineering, and business have rapidly transitioned from striving to develop information from scant data to a situation in which the challenge is now that the amount of information exceeds a human’s ability to examine, let alone absorb, it. Data sets are increasingly complex, and this potentially increases the problems associated with such concerns as missing information and other quality concerns, data heterogeneity, and differing data formats.
The nation’s ability to make use of data depends heavily on the availability of a workforce that is properly trained and ready to tackle high-need areas. Training students to be capable in exploiting big data requires experience with statistical analysis, machine learning, and computational infrastructure that permits the real problems associated with massive data to be revealed and, ultimately, addressed. Analysis of big data requires cross-disciplinary skills, including the ability to make modeling decisions while balancing trade-offs between optimization and approximation, all while being attentive to useful metrics and system robustness. To develop those skills in students, it is important to identify whom to teach, that is, the educational background, experience, and characteristics of a prospective data-science student; what to teach, that is, the technical and practical content that should be taught to the student; and how to teach, that is, the structure and organization of a data-science program.
Training Students to Extract Value from Big Data summarizes a workshop convened in April 2014 by the National Research Council’s Committee on Applied and Theoretical Statistics to explore how best to train students to use big data. The workshop explored the need for training and curricula and coursework that should be included. One impetus for the workshop was the current fragmented view of what is meant by analysis of big data, data analytics, or data science. New graduate programs are introduced regularly, and they have their own notions of what is meant by those terms and, most important, of what students need to know to be proficient in data-intensive work. This report provides a variety of perspectives about those elements and about their integration into courses and curricula…”

The Data Manifesto


Development Initiatives: “Staging a Data Revolution

Accessible, useable, timely and complete data is core to sustainable development and social progress. Access to information provides people with a base to make better choices and have more control over their lives. Too often attempts to deliver sustainable economic, social and environmental results are hindered by the failure to get the right information, in the right format, to the right people, at the right time. Worse still, the most acute data deficits often affect the people and countries facing the most acute problems.

The Data Revolution should be about data grounded in real life. Data and information that gets to the people who need it at national and sub-national levels to help with the decisions they face – hospital directors, school managers, city councillors, parliamentarians. Data that goes beyond averages – that is disaggregated to show the different impacts of decisions, policies and investments on gender, social groups and people living in different places and over time.

We need a Data Revolution that sets a new political agenda, that puts existing data to work, that improves the way data is gathered and ensures that information can be used. To deliver this vision, we need the following steps.


12 steps to a Data Revolution

1.     Implement a national ‘Data Pledge’ to citizens that is supported by governments, private and non-governmental sectors
2.     Address real world questions with joined up and disaggregated data
3.      Empower and up-skill data users of the future through education
4.     Examine existing frameworks and publish existing data
5.     Build an information bank of data assets
6.     Allocate funding available for better data according to national and sub-national priorities
7.     Strengthen national statistical systems’ capacity to collect data
8.     Implement a policy that data is ‘open by default’
9.     Improve data quality by subjecting it to public scrutiny
10.  Put information users’ needs first
11.  Recognise technology cannot solve all barriers to information
12.  Invest in infomediaries’ capacity to translate data into information that policymakers, civil society and the media can actually use…”