Big Data


Special Report on Big Data by Volta – A newsletter on Science, Technology and Society in Europe:  “Locating crime spots, or the next outbreak of a contagious disease, Big Data promises benefits for society as well as business. But more means messier. Do policy-makers know how to use this scale of data-driven decision-making in an effective way for their citizens and ensure their privacy?90% of the world’s data have been created in the last two years. Every minute, more than 100 million new emails are created, 72 hours of new video are uploaded to YouTube and Google processes more than 2 million searches. Nowadays, almost everyone walks around with a small computer in their pocket, uses the internet on a daily basis and shares photos and information with their friends, family and networks. The digital exhaust we leave behind every day contributes to an enormous amount of data produced, and at the same time leaves electronic traces that contain a great deal of personal information….
Until recently, traditional technology and analysis techniques have not been able to handle this quantity and type of data. But recent technological developments have enabled us to collect, store and process data in new ways. There seems to be no limitations, either to the volume of data or technology for storing and analyzing them. Big Data can map a driver’s sitting position to identify a car thief, it can use Google searches to predict outbreaks of the H1N1 flu virus, it can data-mine Twitter to predict the price of rice or use mobile phone top-ups to describe unemployment in Asia.
The word ‘data’ means ‘given’ in Latin. It commonly refers to a description of something that can be recorded and analyzed. While there is no clear definition of the concept of ‘Big Data’, it usually refers to the processing of huge amounts and new types of data that have not been possible with traditional tools.

‘The new development is not necessarily that there are so much more data. It’s rather that data is available to us in a new way.’

The notion of Big Data is kind of misleading, argues Robindra Prabhu, a project manager at the Norwegian Board of Technology. “The new development is not necessarily that there are so much more data. It’s rather that data is available to us in a new way. The digitalization of society gives us access to both ‘traditional’, structured data – like the content of a database or register – and unstructured data, for example the content in a text, pictures and videos. Information designed to be read by humans is now also readable by machines. And this development makes a whole new world of  data gathering and analysis available. Big Data is exciting not just because of the amount and variety of data out there, but that we can process data about so much more than before.”

Open data: Unlocking innovation and performance with liquid information


New report by McKinsey Global Institute:“Open data—machine-readable information, particularly government data, that’s made available to others—has generated a great deal of excitement around the world for its potential to empower citizens, change how government works, and improve the delivery of public services. It may also generate significant economic value, according to a new McKinsey report.1 Our research suggests that seven sectors alone could generate more than $3 trillion a year in additional value as a result of open data, which is already giving rise to hundreds of entrepreneurial businesses and helping established companies to segment markets, define new products and services, and improve the efficiency and effectiveness of operations.

Although the open-data phenomenon is in its early days, we see a clear potential to unlock significant economic value by applying advanced analytics to both open and proprietary knowledge. Open data can become an instrument for breaking down information gaps across industries, allowing companies to share benchmarks and spread best practices that raise productivity. Blended with proprietary data sets, it can propel innovation and help organizations replace traditional and intuitive decision-making approaches with data-driven ones. Open-data analytics can also help uncover consumer preferences, allowing companies to improve new products and to uncover anomalies and needless variations. That can lead to leaner, more reliable processes.
However, investments in technology and expertise are required to use the data effectively. And there is much work to be done by governments, companies, and consumers to craft policies that protect privacy and intellectual property, as well as establish standards to speed the flow of data that is not only open but also “liquid.” After all, consumers have serious privacy concerns, and companies are reluctant to share proprietary information—even when anonymity is assured—for fear of losing competitive advantage…
See also Executive Summary and Full Report”

Government — investor, risk-taker, innovator


Ted Talk by Mariana Mazzucato: “Why doesn’t the government just get out of the way and let the private sector — the “real revolutionaries” — innovate? It’s rhetoric you hear everywhere, and Mariana Mazzucato wants to dispel it. In an energetic talk, she shows how the state — which many see as a slow, hunkering behemoth — is really one of our most exciting risk-takers and market-shapers.”

Smart Citizens


FutureEverything: “This publication aims to shift the debate on the future of cities towards the central place of citizens, and of decentralised, open urban infrastructures. It provides a global perspective on how cities can create the policies, structures and tools to engender a more innovative and participatory society. The publication contains a series of 23 short essays representing some of the key voices developing an emerging discourse around Smart Citizens.  Contributors include:

  • Dan Hill, Smart Citizens pioneer and CEO of communications research centre and transdisciplinary studio Fabrica on why Smart Citizens Make Smart Cities.
  • Anthony Townsend, urban planner, forecaster and author of Smart Cities: Big Data, Civic Hackers, and the Quest for a New Utopia on the tensions between place-making and city-making on the role of mobile technologies in changing the way that people interact with their surroundings.
  • Paul Maltby, Director of the Government Innovation Group and of the Open Data and Transparency in the UK Cabinet Office on how government can support a smarter society.
  • Aditya Dev Sood, Founder and CEO of the Center for Knowledge Societies, presents polarised hypothetical futures for India in 2025 that argues for the use of technology to bridge gaps in social inequality.
  • Adam Greenfield, New York City-based writer and urbanist, on Recuperating the Smart City.

Editors: Drew Hemment, Anthony Townsend
Download Here.

Peer Production: A Modality of Collective Intelligence


New paper by Yochai Benkler, Aaron Shaw and Benjamin Mako Hill:  “Peer production is the most significant organizational innovation that has emerged from
Internet-mediated social practice and among the most a visible and important examples of collective intelligence. Following Benkler,  we define peer production as a form of open creation and sharing performed by groups online that: (1) sets and executes goals in a decentralized manner; (2) harnesses a diverse range of participant motivations, particularly non-monetary motivations; and (3) separates governance and management relations from exclusive forms of property and relational contracts (i.e., projects are governed as open commons or common property regimes and organizational governance utilizes combinations of participatory, meritocratic and charismatic, rather than proprietary or contractual, models). For early scholars of peer production, the phenomenon was both important and confounding for its ability to generate high quality work products in the absence of formal hierarchies and monetary incentives. However, as peer production has become increasingly established in society, the economy, and scholarship, merely describing the success of some peer production projects has become less useful. In recent years, a second wave of scholarship has emerged to challenge assumptions in earlier work; probe nuances glossed over by earlier framings of the phenomena; and identify the necessary dynamics, structures, and conditions for peer production success.
Peer production includes many of the largest and most important collaborative communities on the Internet….
Much of this academic interest in peer production stemmed from the fact that the phenomena resisted straightforward explanations in terms of extant theories of the organization and production of functional information goods like software or encyclopedias. Participants in peer production projects join and contribute valuable resources without the hierarchical bureaucracies or strong leadership structures common to state agencies or firms, and in the absence of clear financial incentives or rewards. As a result, foundationalresearch on peer production was focused on (1) documenting and explaining the organization and governance of peer production communities, (2) understanding the motivation of contributors to peer production, and (3) establishing and evaluating the quality of peer production’s outputs.
In the rest of this chapter, we describe the development of the academic literature on peer production in these three areas – organization, motivation, and quality.”

Implementing Open Innovation in the Public Sector: The Case of Challenge.gov


Article by Ines Mergel and Kevin C. Desouza in Public Administration Review: “As part of the Open Government Initiative, the Barack Obama administration has called for new forms of collaboration with stakeholders to increase the innovativeness of public service delivery. Federal managers are employing a new policy instrument called Challenge.gov to implement open innovation concepts invented in the private sector to crowdsource solutions from previously untapped problem solvers and to leverage collective intelligence to tackle complex social and technical public management problems. The authors highlight the work conducted by the Office of Citizen Services and Innovative Technologies at the General Services Administration, the administrator of the Challenge.gov platform. Specifically, this Administrative Profile features the work of Tammi Marcoullier, program manager for Challenge.gov, and Karen Trebon, deputy program manager, and their role as change agents who mediate collaborative practices between policy makers and public agencies as they navigate the political and legal environments of their local agencies. The profile provides insights into the implementation process of crowdsourcing solutions for public management problems, as well as lessons learned for designing open innovation processes in the public sector”.

What Government Can and Should Learn From Hacker Culture


in The Atlantic: “Can the open-source model work for federal government? Not in every way—for security purposes, the government’s inner workings will never be completely open to the public. Even in the inner workings of government, fears of triggering the next Wikileaks or Snowden scandal may scare officials away from being more open with one another. While not every area of government can be more open, there are a few areas ripe for change.

Perhaps the most glaring need for an open-source approach is in information sharing. Today, among and within several federal agencies, a culture of reflexive and unnecessary information withholding prevails. This knee-jerk secrecy can backfire with fatal consequences, as seen in the 1998 embassy bombings in Africa, the 9/11 attacks, and the Boston Marathon bombings. What’s most troubling is that decades after the dangers of information-sharing were identified, the problem persists.
What’s preventing reform? The answer starts with the government’s hierarchical structure—though an information-is-power mentality and “need to know” Cold War-era culture contribute too. To improve the practice of information sharing, government needs to change the structure of information sharing. Specifically, it needs to flatten the hierarchy.
Former Obama Administration regulation czar Cass Sunstein’s “nudge” approach shows how this could work. In his book Simpler: The Future of Government, he describes how making even small changes to an environment can affect significant changes in behavior. While Sunstein focuses on regulations, the broader lesson is clear: Change the environment to encourage better behavior and people tend to exhibit better behavior. Without such strict adherence to the many tiers of the hierarchy, those working within it could be nudged towards, rather than fight to, share information.
One example of where this worked is in with the State Department’s annual Religious Engagement Report (RER). In 2011, the office in charge of the RER decided that instead of having every embassy submit their data via email, they would post it on a secure wiki. On the surface, this was a decision to change an information-sharing procedure. But it also changed the information-sharing culture. Instead of sharing information only along the supervisor-subordinate axis, it created a norm of sharing laterally, among colleagues.
Another advantage to flattening information-sharing hierarchies is that it reduces the risk of creating “single points of failure,” to quote technology scholar Beth Noveck. The massive amounts of data now available to us may need massive amounts of eyeballs in order to spot patterns of problems—small pools of supervisors atop the hierarchy cannot be expected to shoulder those burdens alone. And while having the right tech tools to share information is part of the solution—as the wiki made it possible for the RER—it’s not enough. Leadership must also create a culture that nudges their staff to use these tools, even if that means relinquishing a degree of their own power.
Finally, a more open work culture would help connect interested parties across government to let them share the hard work of bringing new ideas to fruition. Government is filled with examples of interesting new projects that stall in their infancy. Creating a large pool of collaborators dedicated to a project increases the likelihood that when one torchbearer burns out, others in the agency will pick up for them.
When Linus Torvalds released Linux, it was considered, in Raymond’s words, “subversive” and “a distinct shock.” Could the federal government withstand such a shock?
Evidence suggests it can—and the transformation is already happening in small ways. One of the winners of the Harvard Kennedy School’s Innovations in Government award is State’s Consular Team India (CTI), which won for joining their embassy and four consular posts—each of which used to have its own distinct set of procedures-into a single, more effective unit who could deliver standardized services. As CTI describes it, “this is no top-down bureaucracy” but shares “a common base of information and shared responsibilities.” They flattened the hierarchy, and not only lived, but thrived.”

Open Data Index provides first major assessment of state of open government data


Press Release from the Open Knowledge Foundation: “In the week of a major international summit on government transparency in London, the Open Knowledge Foundation has published its 2013 Open Data Index, showing that governments are still not providing enough information in an accessible form to their citizens and businesses.
The UK and US top the 2013 Index, which is a result of community-based surveys in 70 countries. They are followed by Denmark, Norway and the Netherlands. Of the countries assessed, Cyprus, St Kitts & Nevis, the British Virgin Islands, Kenya and Burkina Faso ranked lowest. There are many countries where the governments are less open but that were not assessed because of lack of openness or a sufficiently engaged civil society. This includes 30 countries who are members of the Open Government Partnership.
The Index ranks countries based on the availability and accessibility of information in ten key areas, including government spending, election results, transport timetables, and pollution levels, and reveals that whilst some good progress is being made, much remains to be done.
Rufus Pollock, Founder and CEO of the Open Knowledge Foundation said:

Opening up government data drives democracy, accountability and innovation. It enables citizens to know and exercise their rights, and it brings benefits across society: from transport, to education and health. There has been a welcome increase in support for open data from governments in the last few years, but this Index reveals that too much valuable information is still unavailable.

The UK and US are leaders on open government data but even they have room for improvement: the US for example does not provide a single consolidated and open register of corporations, while the UK Electoral Commission lets down the UK’s good overall performance by not allowing open reuse of UK election data.
There is a very disappointing degree of openness of company registers across the board: only 5 out of the 20 leading countries have even basic information available via a truly open licence, and only 10 allow any form of bulk download. This information is critical for range of reasons – including tackling tax evasion and other forms of financial crime and corruption.
Less than half of the key datasets in the top 20 countries are available to re-use as open data, showing that even the leading countries do not fully understand the importance of citizens and businesses being able to legally and technically use, reuse and redistribute data. This enables them to build and share commercial and non-commercial services.
To see the full results: https://index.okfn.org. For graphs of the data: https://index.okfn.org/visualisations.”

Making government simpler is complicated


Mike Konczal in The Washington Post: “Here’s something a politician would never say: “I’m in favor of complex regulations.” But what would the opposite mean? What would it mean to have “simple” regulations?

There are two definitions of “simple” that have come to dominate liberal conversations about government. One is the idea that we should make use of “nudges” in regulation. The other is the idea that we should avoid “kludges.” As it turns out, however, these two definitions conflict with each other —and the battle between them will dominate conversations about the state in the years ahead.

The case for “nudges”

The first definition of a “simple” regulation is one emphasized in Cass Sunstein’s recent book titled Simpler: The Future of Government (also see here). A simple policy is one that simply “nudges” people into one choice or another using a variety of default rules, disclosure requirements, and other market structures. Think, for instance, of rules that require fast-food restaurants to post calories on their menus, or a mortgage that has certain terms clearly marked in disclosures.

These sorts of regulations are deemed “choice preserving.” Consumers are still allowed to buy unhealthy fast-food meals or sign up for mortgages they can’t reasonably afford. The regulations are just there to inform people about their choices. These rules are designed to keep the market “free,” where all possibilities are ultimately possible, although there are rules to encourage certain outcomes.
In his book, however, Sunstein adds that there’s another very different way to understand the term “simple.” What most people mean when they think of simple regulations is a rule that is “simple to follow.” Usually a rule is simple to follow because it outright excludes certain possibilities and thus ensures others. Which means, by definition, it limits certain choices.

The case against “kludges”
This second definition of simple plays a key role in political scientist Steve Teles’ excellent recent essay, “Kludgeocracy in America.” For Teles, a “kludge” is a “clumsy but temporarily effective” fix for a policy problem. (The term comes from computer science.) These kludges tend to pile up over time, making government cumbersome and inefficient overall.
Teles focuses on several ways that kludges are introduced into policy, with a particularly sharp focus on overlapping jurisdictions and the related mess of federal and state overlap in programs. But, without specifically invoking it, he also suggests that a reliance on “nudge” regulations can lead to more kludges.
After all, non-kludge policy proposal is one that will be simple to follow and will clearly cause a certain outcome, with an obvious causality chain. This is in contrast to a web of “nudges” and incentives designed to try and guide certain outcomes.

Why “nudges” aren’t always simpler
The distinction between the two is clear if we take a specific example core to both definitions: retirement security.
For Teles, “one of the often overlooked benefits of the Social Security program… is that recipients automatically have taxes taken out of their paychecks, and, then without much effort on their part, checks begin to appear upon retirement. It’s simple and direct. By contrast, 401(k) retirement accounts… require enormous investments of time, effort, and stress to manage responsibly.”

Yet 401(k)s are the ultimately fantasy laboratory for nudge enthusiasts. A whole cottage industry has grown up around figuring out ways to default people into certain contributions, on designing the architecture of choices of investments, and trying to effortlessly and painlessly guide people into certain savings.
Each approach emphasizes different things. If you want to focus your energy on making people better consumers and market participations, expanding our government’s resources and energy into 401(k)s is a good choice. If you want to focus on providing retirement security directly, expanding Social Security is a better choice.
The first is “simple” in that it doesn’t exclude any possibility but encourages market choices. The second is “simple” in that it is easy to follow, and the result is simple as well: a certain amount of security in old age is provided directly. This second approach understands the government as playing a role in stopping certain outcomes, and providing for the opposite of those outcomes, directly….

Why it’s hard to create “simple” regulations
Like all supposed binaries this is really a continuum. Taxes, for instance, sit somewhere in the middle of the two definitions of “simple.” They tend to preserve the market as it is but raise (or lower) the price of certain goods, influencing choices.
And reforms and regulations are often most effective when there’s a combination of these two types of “simple” rules.
Consider an important new paper, “Regulating Consumer Financial Products: Evidence from Credit Cards,” by Sumit Agarwal, Souphala Chomsisengphet, Neale Mahoney and Johannes Stroebel. The authors analyze the CARD Act of 2009, which regulated credit cards. They found that the nudge-type disclosure rules “increased the number of account holders making the 36-month payment value by 0.5 percentage points.” However, more direct regulations on fees had an even bigger effect, saving U.S. consumers $20.8 billion per year with no notable reduction in credit access…..
The balance between these two approaches of making regulations simple will be front and center as liberals debate the future of government, whether they’re trying to pull back on the “submerged state” or consider the implications for privacy. The debate over the best way for government to be simple is still far from over.”

Google’s flu fail shows the problem with big data


Adam Kucharski in The Conversation: “When people talk about ‘big data’, there is an oft-quoted example: a proposed public health tool called Google Flu Trends. It has become something of a pin-up for the big data movement, but it might not be as effective as many claim.
The idea behind big data is that large amount of information can help us do things which smaller volumes cannot. Google first outlined the Flu Trends approach in a 2008 paper in the journal Nature. Rather than relying on disease surveillance used by the US Centers for Disease Control and Prevention (CDC) – such as visits to doctors and lab tests – the authors suggested it would be possible to predict epidemics through Google searches. When suffering from flu, many Americans will search for information related to their condition….
Between 2003 and 2008, flu epidemics in the US had been strongly seasonal, appearing each winter. However, in 2009, the first cases (as reported by the CDC) started in Easter. Flu Trends had already made its predictions when the CDC data was published, but it turned out that the Google model didn’t match reality. It had substantially underestimated the size of the initial outbreak.
The problem was that Flu Trends could only measure what people search for; it didn’t analyse why they were searching for those words. By removing human input, and letting the raw data do the work, the model had to make its predictions using only search queries from the previous handful of years. Although those 45 terms matched the regular seasonal outbreaks from 2003–8, they didn’t reflect the pandemic that appeared in 2009.
Six months after the pandemic started, Google – who now had the benefit of hindsight – updated their model so that it matched the 2009 CDC data. Despite these changes, the updated version of Flu Trends ran into difficulties again last winter, when it overestimated the size of the influenza epidemic in New York State. The incidents in 2009 and 2012 raised the question of how good Flu Trends is at predicting future epidemics, as opposed to merely finding patterns in past data.
In a new analysis, published in the journal PLOS Computational Biology, US researchers report that there are “substantial errors in Google Flu Trends estimates of influenza timing and intensity”. This is based on comparison of Google Flu Trends predictions and the actual epidemic data at the national, regional and local level between 2003 and 2013
Even when search behaviour was correlated with influenza cases, the model sometimes misestimated important public health metrics such as peak outbreak size and cumulative cases. The predictions were particularly wide of the mark in 2009 and 2012:

Original and updated Google Flu Trends (GFT) model compared with CDC influenza-like illness (ILI) data. PLOS Computational Biology 9:10
Click to enlarge

Although they criticised certain aspects of the Flu Trends model, the researchers think that monitoring internet search queries might yet prove valuable, especially if it were linked with other surveillance and prediction methods.
Other researchers have also suggested that other sources of digital data – from Twitter feeds to mobile phone GPS – have the potential to be useful tools for studying epidemics. As well as helping to analysing outbreaks, such methods could allow researchers to analyse human movement and the spread of public health information (or misinformation).
Although much attention has been given to web-based tools, there is another type of big data that is already having a huge impact on disease research. Genome sequencing is enabling researchers to piece together how diseases transmit and where they might come from. Sequence data can even reveal the existence of a new disease variant: earlier this week, researchers announced a new type of dengue fever virus….”