Big data: are we making a big mistake?


Tim Harford in the Financial Times: “Cheerleaders for big data have made four exciting claims, each one reflected in the success of Google Flu Trends: that data analysis produces uncannily accurate results; that every single data point can be captured, making old statistical sampling techniques obsolete; that it is passé to fret about what causes what, because statistical correlation tells us what we need to know; and that scientific or statistical models aren’t needed because, to quote “The End of Theory”, a provocative essay published in Wired in 2008, “with enough data, the numbers speak for themselves”. Unfortunately, these four articles of faith are at best optimistic oversimplifications. At worst, according to David Spiegelhalter, Winton Professor of the Public Understanding of Risk at Cambridge university, they can be “complete bollocks. Absolute nonsense.”…
But big data do not solve the problem that has obsessed statisticians and scientists for centuries: the problem of insight, of inferring what is going on, and figuring out how we might intervene to change a system for the better.
“We have a new resource here,” says Professor David Hand of Imperial College London. “But nobody wants ‘data’. What they want are the answers.”
To use big data to produce such answers will require large strides in statistical methods.
“It’s the wild west right now,” says Patrick Wolfe of UCL. “People who are clever and driven will twist and turn and use every tool to get sense out of these data sets, and that’s cool. But we’re flying a little bit blind at the moment.”
Statisticians are scrambling to develop new methods to seize the opportunity of big data. Such new methods are essential but they will work by building on the old statistical lessons, not by ignoring them.
Recall big data’s four articles of faith. Uncanny accuracy is easy to overrate if we simply ignore false positives, as with Target’s pregnancy predictor. The claim that causation has been “knocked off its pedestal” is fine if we are making predictions in a stable environment but not if the world is changing (as with Flu Trends) or if we ourselves hope to change it. The promise that “N = All”, and therefore that sampling bias does not matter, is simply not true in most cases that count. As for the idea that “with enough data, the numbers speak for themselves” – that seems hopelessly naive in data sets where spurious patterns vastly outnumber genuine discoveries.
“Big data” has arrived, but big insights have not. The challenge now is to solve new problems and gain new answers – without making the same old statistical mistakes on a grander scale than ever.”

The GovLab Index: Privacy and Security


Please find below the latest installment in The GovLab Index series, inspired by the Harper’s Index. “The GovLab Index: Privacy and Security examines the attitudes and concerns of American citizens regarding online privacy. Previous installments include Designing for Behavior ChangeThe Networked Public, Measuring Impact with Evidence, Open Data, The Data Universe, Participation and Civic Engagement and Trust in Institutions.
Globally

  • Percentage of people who feel the Internet is eroding their personal privacy: 56%
  • Internet users who feel comfortable sharing personal data with an app: 37%
  • Number of users who consider it important to know when an app is gathering information about them: 70%
  • How many people in the online world use privacy tools to disguise their identity or location: 28%, or 415 million people
  • Country with the highest penetration of general anonymity tools among Internet users: Indonesia, where 42% of users surveyed use proxy servers
  • Percentage of China’s online population that disguises their online location to bypass governmental filters: 34%

In the United States
Over the Years

  • In 1996, percentage of the American public who were categorized as having “high privacy concerns”: 25%
    • Those with “Medium privacy concerns”: 59%
    • Those who were unconcerned with privacy: 16%
  • In 1998, number of computer users concerned about threats to personal privacy: 87%
  • In 2001, those who reported “medium to high” privacy concerns: 88%
  • Individuals who are unconcerned about privacy: 18% in 1990, down to 10% in 2004
  • How many online American adults are more concerned about their privacy in 2014 than they were a year ago, indicating rising privacy concerns: 64%
  • Number of respondents in 2012 who believe they have control over their personal information: 35%, downward trend for 7 years
  • How many respondents in 2012 continue to perceive privacy and the protection of their personal information as very important or important to the overall trust equation: 78%, upward trend for seven years
  • How many consumers in 2013 trust that their bank is committed to ensuring the privacy of their personal information is protected: 35%, down from 48% in 2004

Privacy Concerns and Beliefs

  • How many Internet users worry about their privacy online: 92%
    • Those who report that their level of concern has increased from 2013 to 2014: 7 in 10
    • How many are at least sometimes worried when shopping online: 93%, up from 89% in 2012
    • Those who have some concerns when banking online: 90%, up from 86% in 2012
  • Number of Internet users who are worried about the amount of personal information about them online: 50%, up from 33% in 2009
    • Those who report that their photograph is available online: 66%
      • Their birthdate: 50%
      • Home address: 30%
      • Cell number: 24%
      • A video: 21%
      • Political affiliation: 20%
  • Consumers who are concerned about companies tracking their activities: 58%
    • Those who are concerned about the government tracking their activities: 38%
  • How many users surveyed felt that the National Security Association (NSA) overstepped its bounds in light of recent NSA revelations: 44%
  • Respondents who are comfortable with advertisers using their web browsing history to tailor advertisements as long as it is not tied to any other personally identifiable information: 36%, up from 29% in 2012
  • Percentage of voters who do not want political campaigns to tailor their advertisements based on their interests: 86%
  • Percentage of respondents who do not want news tailored to their interests: 56%
  • Percentage of users who are worried about their information will be stolen by hackers: 75%
    • Those who are worried about companies tracking their browsing history for targeted advertising: 54%
  • How many consumers say they do not trust businesses with their personal information online: 54%
  • Top 3 most trusted companies for privacy identified by consumers from across 25 different industries in 2012: American Express, Hewlett Packard and Amazon
    • Most trusted industries for privacy: Healthcare, Consumer Products and Banking
    • Least trusted industries for privacy: Internet and Social Media, Non-Profits and Toys
  • Respondents who admit to sharing their personal information with companies they did not trust in 2012 for reasons such as convenience when making a purchase: 63%
  • Percentage of users who say they prefer free online services supported by targeted ads: 61%
    • Those who prefer paid online services without targeted ads: 33%
  • How many Internet users believe that it is not possible to be completely anonymous online: 59%
    • Those who believe complete online anonymity is still possible: 37%
    • Those who say people should have the ability to use the Internet anonymously: 59%
  • Percentage of Internet users who believe that current laws are not good enough in protecting people’s privacy online: 68%
    • Those who believe current laws provide reasonable protection: 24%

FULL LIST at http://thegovlab.org/the-govlab-index-privacy-and-trust/

Open Government: Building Trust and Civic Engagement


Gavin Newsom and Zachary Bookman in the Huffington Post: “Daily life has become inseparable from new technologies. Our phones and tablets let us shop from the couch, track how many miles we run, and keep in touch with friends across town and around the world – benefits barely possible a decade ago.
With respect to our communities, Uber and Lyft now shuttle us around town, reducing street traffic and parking problems. Adopt-a-Hydrant apps coordinate efforts to dig out hydrants after snowstorms, saving firefighters time when battling blazes. Change.org, helps millions petition for and effect social and political change.
Yet as a sector, government typically embraces technology well-behind the consumer curve. This leads to disheartening stories, like veterans waiting months or years for disability claims due to outdated technology or the troubled rollout of the Healthcare.gov website. This is changing.
Cities and states are now the driving force in a national movement to harness technology to share a wealth of government information and data. Many forward thinking local governments now provide effective tools to the public to make sense of all this data.
This is the Open Government movement.
For too long, government information has been locked away in agencies, departments, and archaic IT systems. Senior administrators often have to request the data they need to do their jobs from system operators. Elected officials, in turn, often have to request data from these administrators. The public remains in the dark, and when data is released, it appears in the form of inaccessible or incomprehensible facts and figures.
Governments keep massive volumes of data, from 500 page budget documents to population statistics to neighborhood crime rates. Although raw data is a necessary component of Open Government, for it to empower citizens and officials the data must be transformed into meaningful and actionable insights. Governments must both publish information in “machine readable” format and give people the tools to understand and act on it.
New platforms can transform data from legacy systems into meaningful visualizations. Instant, web-based access to this information not only saves time and money, but also helps government make faster and better decisions. This allows them to serve their communities and builds trust with citizens.
Leading governments like Palo Alto have begun employing technology to leverage these benefits. Even the City of Bell, California, which made headlines in 2010 when senior administrators siphoned millions of dollars from the general fund, is now leveraging cloud technology to turn a new page in its history. The city has presented its financial information in an easily accessible, interactive platform at Bell.OpenGov.com. Citizens and officials alike can see vivid, user generated charts and graphs that show where money goes, what services are offered to residents, and how much those services cost.
In 2009, San Francisco became an early adopter of the open data movement when an executive order made open and machine-readable the default for our consolidated government. That simple order spurred an entirely new industry and the City of San Francisco has been adopting apps like the San Francisco Heat Vulnerability Index and Neighborhood Score ever since. The former identifies areas vulnerable to heat waves with the hope of better preparedness, while the latter provides an overall health and sustainability score, block-by-block for every neighborhood in the city. These new apps use local, state, federal, and private data sets to allow residents to see how their neighborhoods rank.
The California State Lands Commission, responsible for the stewardship of the state’s lands, waterways, and natural resources, is getting in on the Open Government movement too. The Commission now publishes five years of expense and revenue data at CAStateLands.opengov.com (which just launched today!). California residents can now see how the state generates nearly half a billion dollars in revenue from oil and gas contracts, mineral royalties, and leasing programs. The State can now communicate how it manages those resources, so that citizens understand how their government works for them.
The Open Government movement provides a framework for improved public administration and a path for more trust and engagement. Governments have been challenged to do better, and now they can.”

This War of Mine – The Ultimate Serious Game


The Escapist Magazine “…there are not many games about the effect of war. Paweł Miechowski thinks that needs to be changed, and he’s doing it with a little game called This War of Mine from the Polish outfit 11 Bit Studio.
“We’re in the moment where we want to talk about important things via games,” Miechowski said. “We are used to the fact that important topics are covered by music, novels, movies, while games mostly about fun. Laughing ‘ha ha ha’ fun.”
In fact, he believes games are well-suited for showing harsh truths and realities, not by ham-fistedly repeating political phrases or mantras, but by allowing you to draw your own conclusions from the circumstances. “Games are perfect for this because they are interactive. Novels or movies are not,” he said. “Games can take you through the experience through your hands, by your eyes. You are not a spectator. You are part of the experience.”
What is the experience of This War of Mine then? 11 Bit Studios was inspired by the firsthand accounts of people who tried to survive within a modern city that had no law, no order or infrastructure due to an ongoing war between militaries. “Everything we did in this game, we did after extensive research. Any mechanics in the game are just a translation of our knowledge of situations in recent history,” he said. “Yugoslavia, Syria, Serbia. Anywhere civilians survived within a besieged city after war. They were all pretty similar, struggling for water, hygiene items, food, simple tools to make something, wood to heat the house up.”
Miechowski showed me an early build of This War of Mine and that’s exactly what it is. Your only goal, which is emblazoned on the screen when you start the game, is to “Survive for 30 days.” You begin inside a 2D representation of a bombed-out building with several floors. You have a few allies with names like Boris or Yvette, each of whom have traits such as “good cook” or “strong, but slow.” Orders can be given to your team, such as to build a bed or to scavenge the piles of junk within your stronghold for any useful items. You usually start out with nothing, but over time you’ll accumulate all sorts of items and materials. The game is in real time, the hours slowly tick by, but once you assign tasks it can be useful to advance the timeline by clicking the “Start Night” button.”

Potholes and Big Data: Crowdsourcing Our Way to Better Government


Phil Simon in Wired: “Big Data is transforming many industries and functions within organizations with relatively limited budgets.
Consider Thomas M. Menino, up until recently Boston’s longest-serving mayor. At some point in the past few years, Menino realized that it was no longer 1950. Perhaps he was hobnobbing with some techies from MIT at dinner one night. Whatever his motivation, he decided that there just had to be a better, more cost-effective way to maintain and fix the city’s roads. Maybe smartphones could help the city take a more proactive approach to road maintenance.
To that end, in July 2012, the Mayor’s Office of New Urban Mechanics launched a new project called Street Bump, an app that allows drivers to automatically report the road hazards to the city as soon as they hear that unfortunate “thud,” with their smartphones doing all the work.
The app’s developers say their work has already sparked interest from other cities in the U.S., Europe, Africa and elsewhere that are imagining other ways to harness the technology.
Before they even start their trip, drivers using Street Bump fire up the app, then set their smartphones either on the dashboard or in a cup holder. The app takes care of the rest, using the phone’s accelerometer — a motion detector — to sense when a bump is hit. GPS records the location, and the phone transmits it to an AWS remote server.
But that’s not the end of the story. It turned out that the first version of the app reported far too many false positives (i.e., phantom potholes). This finding no doubt gave ammunition to the many naysayers who believe that technology will never be able to do what people can and that things are just fine as they are, thank you. Street Bump 1.0 “collected lots of data but couldn’t differentiate between potholes and other bumps.” After all, your smartphone or cell phone isn’t inert; it moves in the car naturally because the car is moving. And what about the scores of people whose phones “move” because they check their messages at a stoplight?
To their credit, Menino and his motley crew weren’t entirely discouraged by this initial setback. In their gut, they knew that they were on to something. The idea and potential of the Street Bump app were worth pursuing and refining, even if the first version was a bit lacking. Plus, they have plenty of examples from which to learn. It’s not like the iPad, iPod, and iPhone haven’t evolved considerably over time.
Enter InnoCentive, a Massachusetts-based firm specializing in open innovation and crowdsourcing. The City of Boston contracted InnoCentive to improve Street Bump and reduce the amount of tail chasing. The company accepted the challenge and essentially turned it into a contest, a process sometimes called gamification. InnoCentive offered a network of 400,000 experts a share of $25,000 in prize money donated by Liberty Mutual.
Almost immediately, the ideas to improve Street Bump poured in from unexpected places. This crowd had wisdom. Ultimately, the best suggestions came from:

  • A group of hackers in Somerville, Massachusetts, that promotes community education and research
  • The head of the mathematics department at Grand Valley State University in Allendale, MI.
  • An anonymous software engineer

…Crowdsourcing roadside maintenance isn’t just cool. Increasingly, projects like Street Bump are resulting in substantial savings — and better government.”

Democracy in Retreat


Book by Joshua Kurlantzick (Council on Foreign Relations) on “The Revolt of the Middle Class and the Worldwide Decline of Representative Government”: “Since the end of the Cold War, most political theorists have assumed that as countries develop economically, they will also become more democratic—especially if a vibrant middle class takes root. The triumph of democracy, once limited to a tiny number of states and now spread across the globe, has been considered largely inevitable.
In Democracy in Retreat: The Revolt of the Middle Class and the Worldwide Decline of Representative Government, CFR Fellow for Southeast Asia Joshua Kurlantzick identifies forces that threaten democracy and shows that conventional wisdom has blinded world leaders to a real crisis. “Today a constellation of factors, from the rise of China to the lack of economic growth in new democracies to the West’s financial crisis, has come together to hinder democracy throughout the developing world,” he writes. “Absent radical and unlikely changes in the international system, that combination of antidemocratic factors will have serious staying power.”
Kurlantzick pays particular attention to the revolt of middle class citizens, traditionally proponents of reform, who have turned against democracy in countries such as Venezuela, Pakistan, and Taiwan. He observes that countries once held up as model new democracies, such as Hungary and the Czech Republic, have since curtailed social, economic, and political freedoms. Military coups have grabbed power from Honduras to Thailand to Fiji. The number of representative governments has fallen, and the quality of democracy has deteriorated in many states where it had been making progress, including Russia, Kenya, Argentina, and Nigeria.
The renewed strength of authoritarian rule, warns Kurlantzick, means that billions of people around the world continue to live under repressive regimes.”

Can NewsGenius make annotated government documents more understandable?


at E Pluribus Unum: “Last year, Rap Genius launched News Genius to help decode current events. Today, the General Service Administration (GSA) announced that digital annotation service News Genius is now available to help decode federal government Web projects:

“The federal government can now unlock the collaborative “genius” of citizens and communities to make public services easier to access and understand with a new free social media platform launched by GSA today at the Federal #SocialGov Summit on Entrepreneurship and Small Business,” writes Justin Herman, federal social media manager.

“News Genius, an annotation wiki based on Rap Genius now featuring federal-friendly Terms of Service, allows users to enhance policies, regulations and other documents with in-depth explanations, background information and paths to more resources. In the hands of government managers it will improve public services through citizen feedback and plain language, and will reduce costs by delivering these benefits on a free platform that doesn’t require a contract.”

This could be a significant improvement in making complicated policy documents and regulations understandable to the governed. While plain writing is indispensable for open government and mandated by law and regulation, the practice isn’t exactly uniformly practiced in Washington.

If people can understand more about what a given policy, proposed rule or regulation actually says, they may well be more likely to participate in the process of revising it. We’ll see if people adopt the tool, but on balance, that sounds like a step ahead.”

Randomized control trials (RCTs): interesting, but a marginal tool for governments


ODI Researcher Philipp Krause at BeyondBudgets: “Randomized control trials (RCTs) have had a great decade. The stunning line-up of speakers who celebrated J-PAL’s tenth anniversary in Boston last December gives some indication of just how great. They are the shiny new tool of development policy, and a lot of them are pretty cool. Browsing through J-PAL’s library of projects, it’s easy to see how so many of them end up in top-notch academic journals.
So far, so good. But the ambition of RCTs is not just to provide a gold-standard measurement of impact. They aim to actually have an impact on the real world themselves. The scenario goes something like this: researchers investigate the effect of an intervention and use the findings to either get out of that mess quickly (if the intervention doesn’t work) or scale it up quickly (if it does). In the pursuit of this impact-seeker’s Nirvana, it’s easy to conflate a couple of things, notably that an RCT is not the only way to evaluate impact; and evaluating impact is not the only way to use evidence for policy. Unfortunately, it is now surprisingly common to hear RCTs conflated with evidence-use, and evidence-use equated with the key ingredient for better public services in developing countries. The reality of evidence use is different.
Today’s rich countries didn’t get rich by using evidence systematically. This is a point that we recently discussed at a big World Bank – ODI conference on the (coincidental?) tenth anniversary of the WDR 2004. Lant Pritchett made it best when describing Randomistas as engaging in faith-based activity: nobody could accuse the likes of Germany, Switzerland, Sweden or the US of achieving human development by systematically scaling up what works.
What these countries do have in spades is people noisily demanding stuff, and governments giving it to them. In fact, some of the greatest innovations in providing health, unemployment benefits and pensions to poor people (and taking them to scale) happened because citizens seemed to want them, and giving them stuff seemed like a good way to shut them up. Ask Otto Bismarck. It’s not too much of a stretch to call this the history of public spending in a nutshell….
The bottom line is governments s that care about impact have plenty of cheaper, timelier and more appropriate tools and options available to them than RCTs. That doesn’t mean RCTs shouldn’t be done, of course. And the evaluation of aid is a different matter altogether, where donors are free to be as inefficient about evidence-basing as they wish without burdening poor countries.
But for governments the choice of how to go about using systematic evidence is theirs to make. And it’s a tough capability to pick up. Many governments choose not to do it, and there’s no evidence that they suffer for it. It would be wrong for donors to suggest to low-income countries that RCTs are in any way critical for their public service capability. Better call them what they are: interesting, but marginal.”

Pursuing adoption of free and open source software in governments


at O’Reilly Radar: “Reasons for government agencies to adopt free and open source software have been aired repeatedly, including my article mentioned earlier. A few justifications include:

Access
Document formats must allow citizens to read and submit documents without purchasing expensive tools.
Participation
Free software allows outside developers to comment and contribute.
Public ownership
Whatever tools are developed or purchased by the government should belong to the public, as long as no security issues are involved.
Archiving
Proprietary formats can be abandoned by their vendors after only two or three years.
Transparency
Free software allows the public to trust that the tools are accurate and have no security flaws.
Competition
The government has an interest in avoiding lock-in and ensuring that software can be maintained or replaced.
Cost
In the long run, an agency can save a lot of money by investing in programming or system administration skills, or hiring a firm to maintain the free software.

Obviously, though, government agencies haven’t gotten the memo. I’m not just talking metaphorically; there have been plenty of memos urging the use of open source, ranging from the US Department of Defense to laws passed in a number of countries.
And a lot of progress has taken place. Munich, famously, has switched its desktops to GNU/Linux and OpenOffice.org — but the process took 13 years. Elsewhere in Europe, Spain has been making strides, and the UK promises to switch. In Latin America, Brazil has made the most progress. Many countries that could benefit greatly from using free software — and have even made commitments to do so — are held back by a lack of IT staff with the expertise to do so.
Key barriers include:

Procurement processes
General consensus among knowledgeable software programmers holds that age-old rules for procurement shouldn’t be tossed out, but could be tweaked to admit bids from more small businesses that want to avoid the bureaucracy of registering with the government and answering Requests for Proposals (RFPs).
Habits of passivity
Government managers are well aware of how little they understand the software development process — in fact, if you ask them what they would need to adopt more open source software, they rarely come up with useful answers. They prefer to hand all development and maintenance to an outside firm, which takes full advantage of this to isolate agencies from one another and lock in expensive rates.
Lack of knowledgeable IT staff
The government managers have reason to keep hands off free software. One LibrePlanet audience member reported that he installed a good deal of free software at his agency, but that when he left, they could not find knowledgeable IT hires to take over. Bit by bit, the free software was replaced with proprietary products known to the new staff.
Political pressure
The urge to support proprietary companies doesn’t just come from their sales people or lobbyists. Buying software, like other products, is seen by politicians as a way of ensuring that jobs remain in their communities.
Lack of information
Free software is rarely backed by a marketing and sales organization, and even if managers have the initiative to go look for the software, they don’t know how to evaluate its maturity and readiness.

Thoroughgoing change in the area of software requires managers to have a certain consciousness at a higher level: they need to assert control over their missions and adopt agile workflows. That will inevitably spawn a desire for more control over the software that carries out these missions. A posting by Matthew Burton of the Consumer Financial Protection Bureau shows that radical redirections like this are possible.
In the meantime, here are some ideas that the panelists and audience came up with:

Tweaking procurement
If projects can be kept cheap — as Code for America does using grants and other stratagems — they don’t have to navigate the procurement process. Hackathons and challenges can also produce results — but they have a number of limitations, particularly the difficulty developers have in understanding the requirements of the people they want to serve. Some agencies can also bypass procurement by forming partnerships with community groups who produce the software. Finally, a possibly useful model is to take a cut of income from a project instead of charging the government for it.
Education
Managers have heard of open source software by now — great progress from just a few years ago — and are curious about it. On the production side, we need to help them see the benefits of releasing code, and how to monitor their software vendors to make sure the code is really usable. On the consumption side, we need to teach them maturity models and connect them to strong development projects.
Sharing
Most governments have familiar tasks that can be met by the same software base, but end up paying to reinvent (or just reinstall) the wheel. Code for America started a peer network to encourage managers to talk to one another about solutions. The Brazilian government has started a Public Software Portal. The European Union has an open source database and the US federal government has posted a list of government software released as open source.”

European Commission launches network to foster web talent through Massive Open Online Courses (MOOCs)


Press Release: “The Commission is launching a network of providers of MOOCs related to web and apps skills. MOOCs are online university courses which enable people to access quality education without having to leave their homes. The new network aims to map the demand for web-related skills across Europe and to promote the use of Massive Open Online Courses (MOOCs) for capacity-building in those fields.
Web-related industry is generating more economic growth than any other part of the European economy, but hundreds of thousands of jobs remain unfilled due to the lack of qualified staff.
European Commission Vice President Neelie Kroes, responsible for the Digital Agenda, said:
“By 2020, 90% of jobs will need digital skills. That is just around the corner, and we aren’t ready! Already businesses in Europe are facing a shortage of skilled ICT workers. We have to fill that gap, and this network we are launching will help us identify where the gaps are. This goes hand in hand with the work being done through the Grand Coalition for Digital Jobs”.
The Commission calls upon web entrepreneurs, universities, MOOC providers and online learners to join the network, which is part of the “Startup Europe” initiative.
Participants in the network benefit from the exchange of experiences and best practices, opportunities for networking, news updates, and the chance to participate in a conference dedicated to MOOCs for web and apps skills scheduled for the second half of 2014. In addition, the network offers a discussion group that can be found on the European Commission’s portal Open Education Europa. The initiative is coordinated by p.a.u. education and in partnership with Iversity.
Useful links
Link to EC press release on the launch of first pan-European university MOOCs
Open Education Europa website
Startup Europe website
Grand Coalition for Digital Jobs website”