A Different Idea of Our Declaration


Gordon S. Wood reviews Our Declaration: A Reading of the Declaration of Independence in Defense of Equality by Danielle Allen in the New York Review of Books: “If we read the Declaration of Independence slowly and carefully, Danielle Allen believes, then the document can become a basic primer for our democracy. It can be something that all of us—not just scholars and educated elites but common ordinary people—can participate in, and should participate in if we want to be good democratic citizens.
Allen, who is a professor of social science at the Institute for Advanced Study in Princeton, came to this extraordinary conclusion when she was teaching for a decade at the University of Chicago. But it was not the young bright-eyed undergraduates whom she taught by day who inspired her. Instead, it was the much older, life-tested adults whom she taught by night who created “the single most transformative experience” of her teaching career.
As she slowly worked her way through the 1,337 words of the Declaration of Independence with her night students, many of whom had no job or were working two jobs or were stuck in dead-end part-time jobs, Allen discovered that the document had meaning for them and that it was accessible to any reader or hearer of its words. By teaching the document to these adult students in the way that she did, she experienced “a personal metamorphosis.” For the first time in her life she came to realize that the Declaration makes a coherent philosophical argument about equality, an argument that could be made comprehensible to ordinary people who had no special training…”

'Big Data' Will Change How You Play, See the Doctor, Even Eat


We’re entering an age of personal big data, and its impact on our lives will surpass that of the Internet. Data will answer questions we could never before answer with certainty—everyday questions like whether that dress actually makes you look fat, or profound questions about precisely how long you will live.

ADVERTISEMENT

Every 20 years or so, a powerful technology moves from the realm of backroom expertise and into the hands of the masses. In the late-1970s, computing made that transition—from mainframes in glass-enclosed rooms to personal computers on desks. In the late 1990s, the first web browsers made networks, which had been for science labs and the military, accessible to any of us, giving birth to the modern Internet.

Each transition touched off an explosion of innovation and reshaped work and leisure. In 1975, 50,000 PCs were in use worldwide. Twenty years later: 225 million. The number of Internet users in 1995 hit 16 million. Today it’s more than 3 billion. In much of the world, it’s hard to imagine life without constant access to both computing and networks.

The 2010s will be the coming-out party for data. Gathering, accessing and gleaning insights from vast and deep data has been a capability locked inside enterprises long enough. Cloud computing and mobile devices now make it possible to stand in a bathroom line at a baseball game while tapping into massive computing power and databases. On the other end, connected devices such as the Nest thermostat or Fitbit health monitor and apps on smartphones increasingly collect new kinds of information about everyday personal actions and habits, turning it into data about ourselves.

More than 80 percent of data today is unstructured: tangles of YouTube videos, news stories, academic papers, social network comments. Unstructured data has been almost impossible to search for, analyze and mix with other data. A new generation of computers—cognitive computing systems that learn from data—will read tweets or e-books or watch video, and comprehend its content. Somewhat like brains, these systems can link diverse bits of data to come up with real answers, not just search results.

Such systems can work in natural language. The progenitor is the IBM Watson computer that won on Jeopardy in 2011. Next-generation Watsons will work like a super-powered Google. (Google today is a data-searching wimp compared with what’s coming.)

Sports offers a glimpse into the data age. Last season the NBA installed in every arena technology that can “watch” a game and record, in 48 minutes of action, more than 4 million data points about every movement and shot. That alone could yield new insights for NBA coaches, such as which group of five players most efficiently passes the ball around….

Think again about life before personal computing and the Internet. Even if someone told you that you’d eventually carry a computer in your pocket that was always connected to global networks, you would’ve had a hard time imagining what that meant—imagining WhatsApp, Siri, Pandora, Uber, Evernote, Tinder.

As data about everything becomes ubiquitous and democratized, layered on top of computing and networks, it will touch off the most spectacular technology explosion yet. We can see the early stages now. “Big data” doesn’t even begin to describe the enormity of what’s coming next.”

Chief Executive of Nesta on the Future of Government Innovation


Interview between Rahim Kanani and Geoff Mulgan, CEO of NESTA and member of the MacArthur Research Network on Opening Governance: “Our aspiration is to become a global center of expertise on all kinds of innovation, from how to back creative business start-ups and how to shape innovations tools such as challenge prizes, to helping governments act as catalysts for new solutions,” explained Geoff Mulgan, chief executive of Nesta, the UK’s innovation foundation. In an interview with Mulgan, we discussed their new report, published in partnership with Bloomberg Philanthropies, which highlights 20 of the world’s top innovation teams in government. Mulgan and I also discussed the founding and evolution of Nesta over the past few years, and leadership lessons from his time inside and outside government.
Rahim Kanani: When we talk about ‘innovations in government’, isn’t that an oxymoron?
Geoff Mulgan: Governments have always innovated. The Internet and World Wide Web both originated in public organizations, and governments are constantly developing new ideas, from public health systems to carbon trading schemes, online tax filing to high speed rail networks.  But they’re much less systematic at innovation than the best in business and science.  There are very few job roles, especially at senior levels, few budgets, and few teams or units.  So although there are plenty of creative individuals in the public sector, they succeed despite, not because of the systems around them. Risk-taking is punished not rewarded.   Over the last century, by contrast, the best businesses have learned how to run R&D departments, product development teams, open innovation processes and reasonably sophisticated ways of tracking investments and returns.
Kanani: This new report, published in partnership with Bloomberg Philanthropies, highlights 20 of the world’s most effective innovation teams in government working to address a range of issues, from reducing murder rates to promoting economic growth. Before I get to the results, how did this project come about, and why is it so important?
Mulgan: If you fail to generate new ideas, test them and scale the ones that work, it’s inevitable that productivity will stagnate and governments will fail to keep up with public expectations, particularly when waves of new technology—from smart phones and the cloud to big data—are opening up dramatic new possibilities.  Mayor Bloomberg has been a leading advocate for innovation in the public sector, and in New York he showed the virtues of energetic experiment, combined with rigorous measurement of results.  In the UK, organizations like Nesta have approached innovation in a very similar way, so it seemed timely to collaborate on a study of the state of the field, particularly since we were regularly being approached by governments wanting to set up new teams and asking for guidance.
Kanani: Where are some of the most effective innovation teams working on these issues, and how did you find them?
Mulgan: In our own work at Nesta, we’ve regularly sought out the best innovation teams that we could learn from and this study made it possible to do that more systematically, focusing in particular on the teams within national and city governments.  They vary greatly, but all the best ones are achieving impact with relatively slim resources.  Some are based in central governments, like Mindlab in Denmark, which has pioneered the use of design methods to reshape government services, from small business licensing to welfare.  SITRA in Finland has been going for decades as a public technology agency, and more recently has switched its attention to innovation in public services. For example, providing mobile tools to help patients manage their own healthcare.   In the city of Seoul, the Mayor set up an innovation team to accelerate the adoption of ‘sharing’ tools, so that people could share things like cars, freeing money for other things.  In south Australia the government set up an innovation agency that has been pioneering radical ways of helping troubled families, mobilizing families to help other families.
Kanani: What surprised you the most about the outcomes of this research?
Mulgan: Perhaps the biggest surprise has been the speed with which this idea is spreading.  Since we started the research, we’ve come across new teams being created in dozens of countries, from Canada and New Zealand to Cambodia and Chile.  China has set up a mobile technology lab for city governments.  Mexico City and many others have set up labs focused on creative uses of open data.  A batch of cities across the US supported by Bloomberg Philanthropy—from Memphis and New Orleans to Boston and Philadelphia—are now showing impressive results and persuading others to copy them.
 

Selected Readings on Sentiment Analysis


The Living Library’s Selected Readings series seeks to build a knowledge base on innovative approaches for improving the effectiveness and legitimacy of governance. This curated and annotated collection of recommended works on the topic of sentiment analysis was originally published in 2014.

Sentiment Analysis is a field of Computer Science that uses techniques from natural language processing, computational linguistics, and machine learning to predict subjective meaning from text. The term opinion mining is often used interchangeably with Sentiment Analysis, although it is technically a subfield focusing on the extraction of opinions (the umbrella under which sentiment, evaluation, appraisal, attitude, and emotion all lie).

The rise of Web 2.0 and increased information flow has led to an increase in interest towards Sentiment Analysis — especially as applied to social networks and media. Events causing large spikes in media — such as the 2012 Presidential Election Debates — are especially ripe for analysis. Such analyses raise a variety of implications for the future of crowd participation, elections, and governance.

Selected Reading List (in alphabetical order)

Annotated Selected Reading List (in alphabetical order)

Choi, Eunsol et al. “Hedge detection as a lens on framing in the GMO debates: a position paper.” Proceedings of the Workshop on Extra-Propositional Aspects of Meaning in Computational Linguistics 13 Jul. 2012: 70-79. http://bit.ly/1wweftP

  • Understanding the ways in which participants in public discussions frame their arguments is important for understanding how public opinion is formed. This paper adopts the position that it is time for more computationally-oriented research on problems involving framing. In the interests of furthering that goal, the authors propose the following question: In the controversy regarding the use of genetically-modified organisms (GMOs) in agriculture, do pro- and anti-GMO articles differ in whether they choose to adopt a more “scientific” tone?
  • Prior work on the rhetoric and sociology of science suggests that hedging may distinguish popular-science text from text written by professional scientists for their colleagues. The paper proposes a detailed approach to studying whether hedge detection can be used to understand scientific framing in the GMO debates, and provides corpora to facilitate this study. Some of the preliminary analyses suggest that hedges occur less frequently in scientific discourse than in popular text, a finding that contradicts prior assertions in the literature.

Michael, Christina, Francesca Toni, and Krysia Broda. “Sentiment analysis for debates.” (Unpublished MSc thesis). Department of Computing, Imperial College London (2013). http://bit.ly/Wi86Xv

  • This project aims to expand on existing solutions used for automatic sentiment analysis on text in order to capture support/opposition and agreement/disagreement in debates. In addition, it looks at visualizing the classification results for enhancing the ease of understanding the debates and for showing underlying trends. Finally, it evaluates proposed techniques on an existing debate system for social networking.

Murakami, Akiko, and Rudy Raymond. “Support or oppose?: classifying positions in online debates from reply activities and opinion expressions.” Proceedings of the 23rd International Conference on Computational Linguistics: Posters 23 Aug. 2010: 869-875. https://bit.ly/2Eicfnm

  • In this paper, the authors propose a method for the task of identifying the general positions of users in online debates, i.e., support or oppose the main topic of an online debate, by exploiting local information in their remarks within the debate. An online debate is a forum where each user posts an opinion on a particular topic while other users state their positions by posting their remarks within the debate. The supporting or opposing remarks are made by directly replying to the opinion, or indirectly to other remarks (to express local agreement or disagreement), which makes the task of identifying users’ general positions difficult.
  • A prior study has shown that a link-based method, which completely ignores the content of the remarks, can achieve higher accuracy for the identification task than methods based solely on the contents of the remarks. In this paper, it is shown that utilizing the textual content of the remarks into the link-based method can yield higher accuracy in the identification task.

Pang, Bo, and Lillian Lee. “Opinion mining and sentiment analysis.” Foundations and trends in information retrieval 2.1-2 (2008): 1-135. http://bit.ly/UaCBwD

  • This survey covers techniques and approaches that promise to directly enable opinion-oriented information-seeking systems. Its focus is on methods that seek to address the new challenges raised by sentiment-aware applications, as compared to those that are already present in more traditional fact-based analysis. It includes material on summarization of evaluative text and on broader issues regarding privacy, manipulation, and economic impact that the development of opinion-oriented information-access services gives rise to. To facilitate future work, a discussion of available resources, benchmark datasets, and evaluation campaigns is also provided.

Ranade, Sarvesh et al. “Online debate summarization using topic directed sentiment analysis.” Proceedings of the Second International Workshop on Issues of Sentiment Discovery and Opinion Mining 11 Aug. 2013: 7. http://bit.ly/1nbKtLn

  • Social networking sites provide users a virtual community interaction platform to share their thoughts, life experiences and opinions. Online debate forum is one such platform where people can take a stance and argue in support or opposition of debate topics. An important feature of such forums is that they are dynamic and grow rapidly. In such situations, effective opinion summarization approaches are needed so that readers need not go through the entire debate.
  • This paper aims to summarize online debates by extracting highly topic relevant and sentiment rich sentences. The proposed approach takes into account topic relevant, document relevant and sentiment based features to capture topic opinionated sentences. ROUGE (Recall-Oriented Understudy for Gisting Evaluation, which employ a set of metrics and a software package to compare automatically produced summary or translation against human-produced onces) scores are used to evaluate the system. This system significantly outperforms several baseline systems and show improvement over the state-of-the-art opinion summarization system. The results verify that topic directed sentiment features are most important to generate effective debate summaries.

Schneider, Jodi. “Automated argumentation mining to the rescue? Envisioning argumentation and decision-making support for debates in open online collaboration communities.” http://bit.ly/1mi7ztx

  • Argumentation mining, a relatively new area of discourse analysis, involves automatically identifying and structuring arguments. Following a basic introduction to argumentation, the authors describe a new possible domain for argumentation mining: debates in open online collaboration communities.
  • Based on our experience with manual annotation of arguments in debates, the authors propose argumentation mining as the basis for three kinds of support tools, for authoring more persuasive arguments, finding weaknesses in others’ arguments, and summarizing a debate’s overall conclusions.

What ‘urban physics’ could tell us about how cities work


Ruth Graham at Boston Globe: “What does a city look like? If you’re walking down the street, perhaps it looks like people and storefronts. Viewed from higher up, patterns begin to emerge: A three-dimensional grid of buildings divided by alleys, streets, and sidewalks, nearly flat in some places and scraping the sky in others. Pull back far enough, and the city starts to look like something else entirely: a cluster of molecules.

At least, that’s what it looks like to Franz-Josef Ulm, an engineering professor at the Massachusetts Institute of Technology. Ulm has built a career as an expert on the properties, patterns, and environmental potential of concrete. Taking a coffee break at MIT’s Stata Center late one afternoon, he and a colleague were looking at a large aerial photograph of a city when they had a “eureka” moment: “Hey, doesn’t that look like a molecular structure?”
With colleagues, Ulm began analyzing cities the way you’d analyze a material, looking at factors such as the arrangement of buildings, each building’s center of mass, and how they’re ordered around each other. They concluded that cities could be grouped into categories: Boston’s structure, for example, looks a lot like an “amorphous liquid.” Seattle is another liquid, and so is Los Angeles. Chicago, which was designed on a grid, looks like glass, he says; New York resembles a highly ordered crystal.
So far Ulm and his fellow researchers have presented their work at conferences, but it has not yet been published in a scientific journal. If the analogy does hold up, Ulm hopes it will give planners a new tool to understand a city’s structure, its energy use, and possibly even its resilience to climate change.
Ulm calls his new work “urban physics,” and it places him among a number of scientists now using the tools of physics to analyze the practically infinite amount of data that cities produce in the 21st century, from population density to the number of patents produced to energy bill charges. Physicist Marta González, Ulm’s colleague at MIT, recently used cellphone data to analyze traffic patterns in Boston with unprecedented complexity, for example. In 2012, a theoretical physicist was named founding director of New York University’s Center for Urban Science and Progress, whose research is devoted to “urban informatics”; one of its first projects is helping to create the country’s first “quantified community” on the West Side of Manhattan.
In Ulm’s case, he and his colleagues have used freely available data, including street layouts and building coordinates, to plot the structures of 12 cities and analogize them to existing complex materials. In physics, an “order parameter” is a number between 0 and 1 that describes how atoms are arranged in relationship to other atoms nearby; Ulm applies this idea to city layouts. Boston, he says, has an “order parameter” of .52, equivalent to that of a liquid like water. This means its structure is notably disordered, which may have something to do with how it developed. “Boston has grown organically,” he said. “The city, in the way its buildings are organized today, carries that information from its historical evolution.”…

Generative Emergence: A New Discipline of Organizational, Entrepreneurial, and Social Innovation


New book by Benyamin Lichtenstein: “Culminating more than 30 years of research into evolution, complexity science, organizing and entrepreneurship, this book provides insights to scholars who are increasingly using emergence to explain social phenomena. In addition to providing the first comprehensive definition and framework for understanding emergence, it is the first publication of data from a year-long experimental study of emergence in high-potential ventures—a week-by-week longitudinal analysis of their processes based on over 750 interviews and 1000 hours of on-site observation.  These data, combined with reports from over a dozen other studies, confirm the dynamics of the five phase model in multiple contexts…

Key insights from the book include:

  • Findings which show a major difference between an aspiration that generates a purposive drive for generative emergence, versus a performance-driven crisis that sparks organizational change and transformation.  This difference has important implications for studies of entrepreneurship, innovation, and social change.
  • A definition of emergence based on 100+ years of work in philosophy and philosophy of science, evolutionary studies, sociology, and organization science.
  • The most inclusive review of complexity science published, to help reinvigorate and legitimize those methods in the social sciences.
  • The Dynamic States Model—a new approach for understanding the non-linear growth and development of new ventures.
  • In-depth examinations of more than twenty well-known emergence studies, to reveal their shared dynamics and underlying drivers.
  • Proposals for applying the five-phase model—as a logic of emergence—to social innovation, organizational leadership, and entrepreneurial development.”

Crowd-Sourced Augmented Realities: Social Media and the Power of Digital Representation


Pre-publication version of a chapter by Matthew Zook, Mark Graham and  Andrew Boulton  in S. Mains, J. Cupples, and C. Lukinbeal. Mediated Geographies/Geographies of Media. Springer Science International Handbooks in Human Geography, (Forthcoming): “A key and distinguishing feature of society today is that its increasingly documented by crowd-sourced social media discourse about public experiences. Much of this social media content is geo-referenced and exists in layers of information draped over the physical world, invisible to the naked eye but accessible to range of digital (and often) mobile devices. When we access these information layers, they mediate the mundane practices of everyday life, (e.g., What or who is nearby? How do I move from point A to B) through the creation of augmented realities, i.e., unstable, context dependent representations of places brought temporary into being by combining the space of material and virtual experience.
These augmented realities, as particular representations of locations, places and events, are vigorously promoted or contested and thus become important spots in which power is exercised, much in the same way that maps have long had power to reinforce or challenge the status quo. However, because many of the processes and practices behind the creation of augmented realities are unseen, its power is often overlooked in the process of representation or place-making. This paper highlights the points at which power acts and demonstrate that all representations of place – including augmented realities derived from social media – are products of and productive of, social relationships and associated power relations.”
Building upon a case study of Abbottabad, Pakistan after the raid on Osama bin Laden’s compound we construct a four-part typology of the power relations emerging from social practices that enact augmented realities. These include: Distributed power, the complex and socially/spatially distributed authorship of user-generated geospatial content; Communication power, the ways in which particular representations gain prominence; language is a particularly key variable; Code power, the autonomy of software code to regulate actions, or mediate content, or ordering representations in particular ways; and Timeless power, the ways in which digital representations of place reconfigure temporal relationships, particularly sequence and duration, between people and events.

Business Models That Take Advantage of Open Data Opportunities


Mark Boyd at the Programmeableweb: “At last week’s OKFestival in Berlin, Kat Borlongan and Chloé Bonnet from Parisian open data startup Five By Five moderated an interactive speed-geek session to examine how startups are building viability using open data and open data APIs. The picture that emerged revealed a variety of composite approaches being used, with all those presenting having just one thing in common: a commitment to fostering ecosystems that will allow other startups to build alongside them.
The OKFestival—hosted by the Open Knowledge Foundation—brought together more than 1,000 participants from around the globe working on various aspects of the open data agenda: the use of corporate data, open science research, government open data and crowdsourced data projects.
In a session held on the first day of the event, Borlongan facilitated an interactive workshop to help would-be entrepreneurs understand how startups are building business models that take advantage of open data opportunities to create sustainable, employment-generating businesses.
Citing research from the McKinsey Institute that calculates the value of open data to be worth $3 trillion globally, Borlongan said: “So the understanding of the open data process is usually: We throw open data over the wall, then we hold a hackathon, and then people will start making products off it, and then we make the $3 trillion.”
Borlongan argued that it is actually a “blurry identity to be an open data startup” and encouraged participants to unpack, with each of the startups presenting exactly how income can be generated and a viable business built in this space.
Jeni Tennison, from the U.K.’s Open Data Institute (which supports 15 businesses in its Startup Programme) categorizes two types of business models:

  1. Businesses that publish (but do not sell) open data.
  2. Businesses built on top of using open data.

Businesses That Publish but Do Not Sell Open Data

At the Open Data Institute, Tennison is investigating the possibility of an open address database that would provide street address data for every property in the U.K. She describes three types of business models that could be created by projects that generated and published such data:
Freemium: In this model, the bulk data of open addresses could be made available freely, “but if you want an API service, then you would pay for it.” Tennison pointed to lots of opportunities also to degrade the freemium-level data—for example, having it available in bulk but not at a particularly granular level (unless you pay for it), or by provisioning reuse on a share-only basis, but you would pay if you wanted the data for corporate use cases (similar to how OpenCorporates sells access to its data).
Cross-subsidy: In this approach, the data would be available, and the opportunities to generate income would come from providing extra services, like consultancy or white labeling data services alongside publishing the open data.
Network: In this business model, value is created by generating a network effect around the core business interest, which may not be the open data itself. As an example, Tennison suggested that if a post office or delivery company were to create the open address database, it might be interested in encouraging private citizens to collaboratively maintain or crowdsource the quality of the data. The revenue generated by this open data would then come from reductions in the cost of delivery services as the data improved accuracy.

Businesses Built on Top of Open Data

Six startups working in unique ways to make use of available open data also presented their business models to OKFestival attendees: Development Seed, Mapbox, OpenDataSoft, Enigma.io, Open Bank API, and Snips.

Startup: Development Seed
What it does: Builds solutions for development, public health and citizen democracy challenges by creating open source tools and utilizing open data.
Open data API focus: Regularly uses open data APIs in its projects. For example, it worked with the World Bank to create a data visualization website built on top of the World Bank API.
Type of business model: Consultancy, but it has also created new businesses out of the products developed as part of its work, most notably Mapbox (see below).

Startup: Enigma.io
What it does: Open data platform with advanced discovery and search functions.
Open data API focus: Provides the Enigma API to allow programmatic access to all data sets and some analytics from the Enigma platform.
Type of business model: SaaS including a freemium plan with no degradation of data and with access to API calls; some venture funding; some contracting services to particular enterprises; creating new products in Enigma Labs for potential later sale.

Startup: Mapbox
What it does: Enables users to design and publish maps based on crowdsourced OpenStreetMap data.
Open data API focus: Uses OpenStreetMap APIs to draw data into its map-creation interface; provides the Mapbox API to allow programmatic creation of maps using Mapbox web services.
Type of business model: SaaS including freemium plan; some tailored contracts for big map users such as Foursquare and Evernote.

Startup: Open Bank Project
What it does: Creates an open source API for use by banks.
Open data API focus: Its core product is to build an API so that banks can use a standard, open source API tool when creating applications and web services for their clients.
Type of business model: Contract license with tiered SLAs depending on the number of applications built using the API; IT consultancy projects.

Startup: OpenDataSoft
What it does: Provides an open data publishing platform so that cities, governments, utilities and companies can publish their own data portal for internal and public use.
Open data API focus: It’s able to route data sources into the portal from a publisher’s APIs; provides automatic API-creation tools so that any data set uploaded to the portal is then available as an API.
Type of business model: SaaS model with freemium plan, pricing by number of data sets published and number of API calls made against the data, with free access for academic and civic initiatives.

Startup: Snips
What it does: Predictive modeling for smart cities.
Open data API focus: Channels some open and client proprietary data into its modeling algorithm calculations via API; provides a predictive modeling API for clients’ use to programmatically generate solutions based on their data.
Type of business model: Creating one B2C app product for sale as a revenue-generation product; individual contracts with cities and companies to solve particular pain points, such as using predictive modeling to help a post office company better manage staff rosters (matched to sales needs) and a consultancy project to create a visualization mapping tool that can predict the risk of car accidents for a city….”

The Quiet Movement to Make Government Fail Less Often


in The New York Times: “If you wanted to bestow the grandiose title of “most successful organization in modern history,” you would struggle to find a more obviously worthy nominee than the federal government of the United States.

In its earliest stirrings, it established a lasting and influential democracy. Since then, it has helped defeat totalitarianism (more than once), established the world’s currency of choice, sent men to the moon, built the Internet, nurtured the world’s largest economy, financed medical research that saved millions of lives and welcomed eager immigrants from around the world.

Of course, most Americans don’t think of their government as particularly successful. Only 19 percent say they trust the government to do the right thing most of the time, according to Gallup. Some of this mistrust reflects a healthy skepticism that Americans have always had toward centralized authority. And the disappointing economic growth of recent decades has made Americans less enamored of nearly every national institution.

But much of the mistrust really does reflect the federal government’s frequent failures – and progressives in particular will need to grapple with these failures if they want to persuade Americans to support an active government.

When the federal government is good, it’s very, very good. When it’s bad (or at least deeply inefficient), it’s the norm.

The evidence is abundant. Of the 11 large programs for low- and moderate-income people that have been subject to rigorous, randomized evaluation, only one or two show strong evidence of improving most beneficiaries’ lives. “Less than 1 percent of government spending is backed by even the most basic evidence of cost-effectiveness,” writes Peter Schuck, a Yale law professor, in his new book, “Why Government Fails So Often,” a sweeping history of policy disappointments.

As Mr. Schuck puts it, “the government has largely ignored the ‘moneyball’ revolution in which private-sector decisions are increasingly based on hard data.”

And yet there is some good news in this area, too. The explosion of available data has made evaluating success – in the government and the private sector – easier and less expensive than it used to be. At the same time, a generation of data-savvy policy makers and researchers has entered government and begun pushing it to do better. They have built on earlier efforts by the Bush and Clinton administrations.

The result is a flowering of experiments to figure out what works and what doesn’t.

New York City, Salt Lake City, New York State and Massachusetts have all begun programs to link funding for programs to their success: The more effective they are, the more money they and their backers receive. The programs span child care, job training and juvenile recidivism.

The approach is known as “pay for success,” and it’s likely to spread to Cleveland, Denver and California soon. David Cameron’s conservative government in Britain is also using it. The Obama administration likes the idea, and two House members – Todd Young, an Indiana Republican, and John Delaney, a Maryland Democrat – have introduced a modest bill to pay for a version known as “social impact bonds.”

The White House is also pushing for an expansion of randomized controlled trials to evaluate government programs. Such trials, Mr. Schuck notes, are “the gold standard” for any kind of evaluation. Using science as a model, researchers randomly select some people to enroll in a government program and others not to enroll. The researchers then study the outcomes of the two groups….”

No silver bullet: De-identification still doesn’t work


Arvind Narayanan and Edward W. Felten: “Paul Ohm’s 2009 article Broken Promises of Privacy spurred a debate in legal and policy circles on the appropriate response to computer science research on re-identification techniques. In this debate, the empirical research has often been misunderstood or misrepresented. A new report by Ann Cavoukian and Daniel Castro is full of such inaccuracies, despite its claims of “setting the record straight.” In a response to this piece, Ed Felten and I point out eight of our most serious points of disagreement with Cavoukian and Castro. The thrust of our arguments is that (i) there is no evidence that de-identification works either in theory or in practice and (ii) attempts to quantify its efficacy are unscientific and promote a false sense of security by assuming unrealistic, artificially constrained models of what an adversary might do. Specifically, we argue that:

  1. There is no known effective method to anonymize location data, and no evidence that it’s meaningfully achievable.
  2. Computing re-identification probabilities based on proof-of-concept demonstrations is silly.
  3. Cavoukian and Castro ignore many realistic threats by focusing narrowly on a particular model of re-identification.
  4. Cavoukian and Castro concede that de-identification is inadequate for high-dimensional data. But nowadays most interesting datasets are high-dimensional.
  5. Penetrate-and-patch is not an option.
  6. Computer science knowledge is relevant and highly available.
  7. Cavoukian and Castro apply different standards to big data and re-identification techniques.
  8. Quantification of re-identification probabilities, which permeates Cavoukian and Castro’s arguments, is a fundamentally meaningless exercise.

Data privacy is a hard problem. Data custodians face a choice between roughly three alternatives: sticking with the old habit of de-identification and hoping for the best; turning to emerging technologies like differential privacy that involve some trade-offs in utility and convenience; and using legal agreements to limit the flow and use of sensitive data. These solutions aren’t fully satisfactory, either individually or in combination, nor is any one approach the best in all circumstances. Change is difficult. When faced with the challenge of fostering data science while preventing privacy risks, the urge to preserve the status quo is understandable. However, this is incompatible with the reality of re-identification science. If a “best of both worlds” solution exists, de-identification is certainly not that solution. Instead of looking for a silver bullet, policy makers must confront hard choices.”