The Quiet Movement to Make Government Fail Less Often


in The New York Times: “If you wanted to bestow the grandiose title of “most successful organization in modern history,” you would struggle to find a more obviously worthy nominee than the federal government of the United States.

In its earliest stirrings, it established a lasting and influential democracy. Since then, it has helped defeat totalitarianism (more than once), established the world’s currency of choice, sent men to the moon, built the Internet, nurtured the world’s largest economy, financed medical research that saved millions of lives and welcomed eager immigrants from around the world.

Of course, most Americans don’t think of their government as particularly successful. Only 19 percent say they trust the government to do the right thing most of the time, according to Gallup. Some of this mistrust reflects a healthy skepticism that Americans have always had toward centralized authority. And the disappointing economic growth of recent decades has made Americans less enamored of nearly every national institution.

But much of the mistrust really does reflect the federal government’s frequent failures – and progressives in particular will need to grapple with these failures if they want to persuade Americans to support an active government.

When the federal government is good, it’s very, very good. When it’s bad (or at least deeply inefficient), it’s the norm.

The evidence is abundant. Of the 11 large programs for low- and moderate-income people that have been subject to rigorous, randomized evaluation, only one or two show strong evidence of improving most beneficiaries’ lives. “Less than 1 percent of government spending is backed by even the most basic evidence of cost-effectiveness,” writes Peter Schuck, a Yale law professor, in his new book, “Why Government Fails So Often,” a sweeping history of policy disappointments.

As Mr. Schuck puts it, “the government has largely ignored the ‘moneyball’ revolution in which private-sector decisions are increasingly based on hard data.”

And yet there is some good news in this area, too. The explosion of available data has made evaluating success – in the government and the private sector – easier and less expensive than it used to be. At the same time, a generation of data-savvy policy makers and researchers has entered government and begun pushing it to do better. They have built on earlier efforts by the Bush and Clinton administrations.

The result is a flowering of experiments to figure out what works and what doesn’t.

New York City, Salt Lake City, New York State and Massachusetts have all begun programs to link funding for programs to their success: The more effective they are, the more money they and their backers receive. The programs span child care, job training and juvenile recidivism.

The approach is known as “pay for success,” and it’s likely to spread to Cleveland, Denver and California soon. David Cameron’s conservative government in Britain is also using it. The Obama administration likes the idea, and two House members – Todd Young, an Indiana Republican, and John Delaney, a Maryland Democrat – have introduced a modest bill to pay for a version known as “social impact bonds.”

The White House is also pushing for an expansion of randomized controlled trials to evaluate government programs. Such trials, Mr. Schuck notes, are “the gold standard” for any kind of evaluation. Using science as a model, researchers randomly select some people to enroll in a government program and others not to enroll. The researchers then study the outcomes of the two groups….”

France: Rapport de la Commission Open Data en santé


“La Commission « open data en santé », qui s’est réunie de novembre 2013 à mai 2014, avait pour mission de débattre, dans un cadre pluraliste associant les parties prenantes, des enjeux et des propositions en matière d’accès aux données de santé.
Ce rapport, remis le 9 juillet 2014 à Marisol Touraine, Ministre des Affaires sociales et de la Santé, retrace les travaux et discussions de la Commission :

  • Un panorama de l’existant (partie 1) : définitions des concepts, état du droit, présentation de la gouvernance, présentation de l’accès aux données du SNIIRAM et du PMSI, cartographie des données de santé et enseignements tirés des expériences étrangères ;
  • Les enjeux pour l’avenir (partie 2) ;
  • Les actions à mener (partie 3) : données à ouvrir en open data, orientations en matière de données réidentifiantes, données relatives aux professionnels et aux établissements.

Ce rapport a été adopté consensuellement par l’ensemble des membres de la commission, qui partagent des attentes communes et fortes.”
Rapport final commission open data (pdf – 1 Mo) – [09/07/2014] – [MAJ : 09/07/2014]

Facebook Nation


Essay by Leonidas Donskis in the Hedgehog Review: “A German anthropologist who lived in England and was studying the society and culture of Kosovo Albanians once said something that lodged firmly in my memory: Whenever she wanted to make an interesting thought or position available to Albanian students or intellectuals, she simply published it on her Facebook page. “You don’t have to add a comment or explain anything,” she said with a smile. “The posting of it on Facebook is a sure sign that the message is a good one.”
She promised to perform this operation with a few thoughts of mine she had heard and liked in one of my lectures. When she noticed my surprise, she cheerfully explained that the Albanians considers themselves such a particularly dispersed lot, such a diaspora nation par excellence, that they are inclined to view themselves as a coherent collectivity only on Facebook. Without it, their friends, relatives, and family members living in Albania and elsewhere would have no tangible ties to one another.
This made me think that today’s version of Ahasver, the Wandering Jew of legend, is a Facebook user, and that Facebook—not the physical world—is where that displaced person wanders.
The Facebook Nation is host to an ongoing referendum in which its denizens cast their ballots daily, hourly, even minute by minute. Let’s make no mistake about what it means to be in this peculiar digital republic. For a lover, as Milan Kundera put it, to be is to live in the eye of one’s beloved. And what is it to be for people who don’t know where they are or where they want to be or even if they exist at all? Quite simply, it is to be liked on Facebook.
Facebook is where everyone becomes his or her own journalist, a situation that has forced real journalists to become Facebooking and tweeting simulacra of their former selves, or else to abandon journalism for other pursuits, whether literary or academic or something altogether different. Évariste Gamelin, the protagonist of Anatole France’s 1912 novel Les dieux ont soif (The Gods Are Thirsty), is a painter, a passionate disciple of Jacques Louis David, and a young fanatic who doesn’t know whether he is fatally in love with his charming girlfriend or with his mother, the Revolution. He believes that after the Revolution every citizen will become a judge of himself and of the Republic. But modernity plays a joke on him: It does not fulfill this promise. It had no intention of doing so. Instead of turning into judges, we all became journalists…”

Networks and Hierarchies


on whether political hierarchy in the form of the state has met its match in today’s networked world in the American Interest: “…To all the world’s states, democratic and undemocratic alike, the new informational, commercial, and social networks of the internet age pose a profound challenge, the scale of which is only gradually becoming apparent. First email achieved a dramatic improvement in the ability of ordinary citizens to communicate with one another. Then the internet came to have an even greater impact on the ability of citizens to access information. The emergence of search engines marked a quantum leap in this process. The advent of laptops, smartphones, and other portable devices then emancipated electronic communication from the desktop. With the explosive growth of social networks came another great leap, this time in the ability of citizens to share information and ideas.
It was not immediately obvious how big a challenge all this posed to the established state. There was a great deal of cheerful talk about the ways in which the information technology revolution would promote “smart” or “joined-up” government, enhancing the state’s ability to interact with citizens. However, the efforts of Anonymous, Wikileaks and Edward Snowden to disrupt the system of official secrecy, directed mainly against the U.S. government, have changed everything. In particular, Snowden’s revelations have exposed the extent to which Washington was seeking to establish a parasitical relationship with the key firms that operate the various electronic networks, acquiring not only metadata but sometimes also the actual content of vast numbers of phone calls and messages. Techniques of big-data mining, developed initially for commercial purposes, have been adapted to the needs of the National Security Agency.
The most recent, and perhaps most important, network challenge to hierarchy comes with the advent of virtual currencies and payment systems like Bitcoin. Since ancient times, states have reaped considerable benefits from monopolizing or at least regulating the money created within their borders. It remains to be seen how big a challenge Bitcoin poses to the system of national fiat currencies that has evolved since the 1970s and, in particular, how big a challenge it poses to the “exorbitant privilege” enjoyed by the United States as the issuer of the world’s dominant reserve (and transaction) currency. But it would be unwise to assume, as some do, that it poses no challenge at all….”

No silver bullet: De-identification still doesn’t work


Arvind Narayanan and Edward W. Felten: “Paul Ohm’s 2009 article Broken Promises of Privacy spurred a debate in legal and policy circles on the appropriate response to computer science research on re-identification techniques. In this debate, the empirical research has often been misunderstood or misrepresented. A new report by Ann Cavoukian and Daniel Castro is full of such inaccuracies, despite its claims of “setting the record straight.” In a response to this piece, Ed Felten and I point out eight of our most serious points of disagreement with Cavoukian and Castro. The thrust of our arguments is that (i) there is no evidence that de-identification works either in theory or in practice and (ii) attempts to quantify its efficacy are unscientific and promote a false sense of security by assuming unrealistic, artificially constrained models of what an adversary might do. Specifically, we argue that:

  1. There is no known effective method to anonymize location data, and no evidence that it’s meaningfully achievable.
  2. Computing re-identification probabilities based on proof-of-concept demonstrations is silly.
  3. Cavoukian and Castro ignore many realistic threats by focusing narrowly on a particular model of re-identification.
  4. Cavoukian and Castro concede that de-identification is inadequate for high-dimensional data. But nowadays most interesting datasets are high-dimensional.
  5. Penetrate-and-patch is not an option.
  6. Computer science knowledge is relevant and highly available.
  7. Cavoukian and Castro apply different standards to big data and re-identification techniques.
  8. Quantification of re-identification probabilities, which permeates Cavoukian and Castro’s arguments, is a fundamentally meaningless exercise.

Data privacy is a hard problem. Data custodians face a choice between roughly three alternatives: sticking with the old habit of de-identification and hoping for the best; turning to emerging technologies like differential privacy that involve some trade-offs in utility and convenience; and using legal agreements to limit the flow and use of sensitive data. These solutions aren’t fully satisfactory, either individually or in combination, nor is any one approach the best in all circumstances. Change is difficult. When faced with the challenge of fostering data science while preventing privacy risks, the urge to preserve the status quo is understandable. However, this is incompatible with the reality of re-identification science. If a “best of both worlds” solution exists, de-identification is certainly not that solution. Instead of looking for a silver bullet, policy makers must confront hard choices.”

Introduction to Open Geospatial Consortium (OGC) Standards


Joseph McGenn; Dominic Taylor; Gail Millin-Chalabi (Editor); Kamie Kitmitto (Editor) at Jorum : “The onset of the Information Age and Digital Revolution has created a knowledge based society where the internet acts as a global platform for the sharing of information. In a geospatial context, this resulted in an advancement of techniques in how we acquire, study and share geographic information and with the development of Geographic Information Systems (GIS), locational services, and online mapping, spatial data has never been more abundant. The transformation to this digital era has not been without its drawbacks, and a forty year lack of common polices to data sharing has resulted in compatibility issues and great diversity in how software and data are delivered. Essential to the sharing of spatial information is interoperability, where different programmes can exchange and open data from various sources seamlessly. Applying universal standards across a sector provides interoperable solutions. The Open Geospatial Consortium (OGC) facilitates interoperability by providing open standard specifications which organisations can use to develop geospatial software. This means that two separate pieces of software or platforms, if developed using open standard specifications, can exchange data without compatibility issues. By defining these specifications and standards the OGC plays a crucial role in how geospatial information is shared on a global scale. Standard specifications are the invisible glue that holds information systems together, without which, data sharing generally would be an arduous task. On some level they keep the world spinning and this course will instil some appreciation for them from a geospatial perspective. This course introduces users to the OGC and all the common standards in the context of geoportals and mapping solutions. These standards are defined and explored using a number of platforms and interoperability is demonstrated in a practical sense. Finally, users will implement these standards to develop their own platforms for sharing geospatial information.”

Brief survey of crowdsourcing for data mining


Paper by Guo XintongWang Hongzhi, Yangqiu Song, and Gao Hong in Expert Systems with Applications: “Crowdsourcing allows large-scale and flexible invocation of human input for data gathering and analysis, which introduces a new paradigm of data mining process. Traditional data mining methods often require the experts in analytic domains to annotate the data. However, it is expensive and usually takes a long time. Crowdsourcing enables the use of heterogeneous background knowledge from volunteers and distributes the annotation process to small portions of efforts from different contributions. This paper reviews the state-of-the-arts on the crowdsourcing for data mining in recent years. We first review the challenges and opportunities of data mining tasks using crowdsourcing, and summarize the framework of them. Then we highlight several exemplar works in each component of the framework, including question designing, data mining and quality control. Finally, we conclude the limitation of crowdsourcing for data mining and suggest related areas for future research.

New Commerce Department report explores huge benefits, low cost of government data


Mark Doms, Under Secretary for Economic Affairs in a blog: Today we are pleased to roll out an important new Commerce Department report on government data. “Fostering Innovation, Creating Jobs, Driving Better Decisions: The Value of Government Data,” arrives as our society increasingly focuses on how the intelligent use of data can make our businesses more competitive, our governments smarter, and our citizens better informed.

And when it comes to data, as the Under Secretary for Economic Affairs, I have a special appreciation for the Commerce Department’s two preeminent statistical agencies, the Census Bureau and the Bureau of Economic Analysis. These agencies inform us on how our $17 trillion economy is evolving and how our population (318 million and counting) is changing, data critical to our country. Although “Big Data” is all the rage these days, the government has been in this  business for a long time: the first Decennial Census was in 1790, gathering information on close to four million people, a huge dataset for its day, and not too shabby by today’s standards as well.

Just how valuable is the data we provide? Our report seeks to answer this question by exploring the range of federal statistics and how they are applied in decision-making. Examples of our data include gross domestic product, employment, consumer prices, corporate profits, retail sales, agricultural supply and demand, population, international trade and much more.

Clearly, as shown in the report, the value of this information to our society far exceeds its cost – and not just because the price tag is shockingly low: three cents, per person, per day. Federal statistics guide trillions of dollars in annual investments at an average annual cost of $3.7 billion: just 0.02 percent of our $17 trillion dollar economy covers the massive amount of data collection, processing and dissemination. With a statistical system that is comprehensive, consistent, confidential, relevant and accessible, the federal government is uniquely positioned to provide a wide range of statistics that complement the vast and growing sources of private sector data.

Our federally collected information is frequently “invisible,” because attribution is not required. But it flows daily into myriad commercial products and services. Today’s report identifies the industries that intensively use our data and provides a rough estimate of the size of this sector. The lower-bound estimate suggests government statistics help private firms generate revenues of at least $24 billion annually – more than six times what we spend for the data. The upper-bound estimate suggests annual revenues of $221 billion!

This report takes a first crack at putting an actual dollars and cents value to government data. We’ve learned a lot from this initial study, and look forward to honing in even further on that figure in our next report.”

Forget The Wisdom of Crowds; Neurobiologists Reveal The Wisdom Of The Confident


Emerging Technology From the arXiv: “Way back in 1906, the English polymath Francis Galton visited a country fair in which 800 people took part in a contest to guess the weight of a slaughtered ox. After the fair, he collected the guesses and calculated their average which turned out to be 1208 pounds. To Galton’s surprise, this was within 1 per cent of the true weight of 1198 pounds.
This is one of the earliest examples of a phenomenon that has come to be known as the wisdom of the crowd. The idea is that the collective opinion of a group of individuals can be better than a single expert opinion.
This phenomenon is commonplace today on websites such as Reddit in which users vote on the importance of particular stories and the most popular are given greater prominence.
However, anyone familiar with Reddit will know that the collective opinion isn’t always wise. In recent years, researchers have spent a significant amount of time and effort teasing apart the factors that make crowds stupid. One important factor turns out to be the way members of a crowd influence each other.
It turns out that if a crowd offers a wide range of independent estimates, then it is more likely to be wise. But if members of the crowd are influenced in the same way, for example by each other or by some external factor, then they tend to converge on a biased estimate. In this case, the crowd is likely to be stupid.
Today, Gabriel Madirolas and Gonzalo De Polavieja at the Cajal Institute in Madrid, Spain, say they found a way to analyse the answers from a crowd which allows them to remove this kind of bias and so settle on a wiser answer.
The theory behind their work is straightforward. Their idea is that some people are more strongly influenced by additional information than others who are confident in their own opinion. So identifying these more strongly influenced people and separating them from the independent thinkers creates two different groups. The group of independent thinkers is then more likely to give a wise estimate. Or put another way, ignore the wisdom of the crowd in favour of the wisdom of the confident.
So how to identify confident thinkers. Madirolas and De Polavieja began by studying the data from an earlier set of experiments in which groups of people were given tasks such as to estimate the length of the border between Switzerland and Italy, the correct answer being 734 kilometres.
After one task, some groups were shown the combined estimates of other groups before beginning their second task. These experiments clearly showed how this information biased the answers from these groups in their second tasks.
Madirolas and De Polavieja then set about creating a mathematical model of how individuals incorporate this extra information. They assume that each person comes to a final estimate based on two pieces of information: first, their own independent estimate of the length of the border and second, the earlier combined estimate revealed to the group. Each individual decides on a final estimate depending on the weighting they give to each piece of information.
Those people who are heavily biased give a strong weighting to the additional information whereas people who are confident in their own estimate give a small or zero weighting to the additional information.
Madirolas and De Polavieja then take each person’s behaviour and fit it to this model to reveal how independent their thinking has been.
That allows them to divide the groups into independent thinkers and biased thinkers. Taking the collective opinion of the independent thinkers then gives a much more accurate estimate of the length of the border.
“Our results show that, while a simple operation like the mean, median or geometric mean of a group may not allow groups to make good estimations, a more complex operation taking into account individuality in the social dynamics can lead to a better collective intelligence,” they say.

Ref: arxiv.org/abs/1406.7578 : Wisdom of the Confident: Using Social Interactions to Eliminate the Bias in Wisdom of the Crowds”

Incentivizing Peer Review


in Wired on “The Last Obstacle for Open Access Science: The Galapagos Islands’ Charles Darwin Foundation runs on an annual operating budget of about $3.5 million. With this money, the center conducts conservation research, enacts species-saving interventions, and provides educational resources about the fragile island ecosystems. As a science-based enterprise whose work would benefit greatly from the latest research findings on ecological management, evolution, and invasive species, there’s one glaring hole in the Foundation’s budget: the $800,000 it would cost per year for subscriptions to leading academic journals.
According to Richard Price, founder and CEO of Academia.edu, this episode is symptomatic of a larger problem. “A lot of research centers” – NGOs, academic institutions in the developing world – “are just out in the cold as far as access to top journals is concerned,” says Price. “Research is being commoditized, and it’s just another aspect of the digital divide between the haves and have-nots.”
 
Academia.edu is a key player in the movement toward open access scientific publishing, with over 11 million participants who have uploaded nearly 3 million scientific papers to the site. It’s easy to understand Price’s frustration with the current model, in which academics donate their time to review articles, pay for the right to publish articles, and pay for access to articles. According to Price, journals charge an average of $4000 per article: $1500 for production costs (reformatting, designing), $1500 to orchestrate peer review (labor costs for hiring editors, administrators), and $1000 of profit.
“If there were no legacy in the scientific publishing industry, and we were looking at the best way to disseminate and view scientific results,” proposes Price, “things would look very different. Our vision is to build a complete replacement for scientific publishing,” one that would allow budget-constrained organizations like the CDF full access to information that directly impacts their work.
But getting to a sustainable new world order requires a thorough overhaul of academic publishing industry. The alternative vision – of “open science” – has two key properties: the uninhibited sharing of research findings, and a new peer review system that incorporates the best of the scientific community’s feedback. Several groups have made progress on the former, but the latter has proven particularly difficult given the current incentive structure. The currency of scientific research is the number of papers you’ve published and their citation counts – the number of times other researchers have referred to your work in their own publications. The emphasis is on creation of new knowledge – a worthy goal, to be sure – but substantial contributions to the quality, packaging, and contextualization of that knowledge in the form of peer review goes largely unrecognized. As a result, researchers view their role as reviewers as a chore, a time-consuming task required to sustain the ecosystem of research dissemination.
“Several experiments in this space have tried to incorporate online comment systems,” explains Price, “and the result is that putting a comment box online and expecting high quality comments to flood in is just unrealistic. My preference is to come up with a system where you’re just as motivated to share your feedback on a paper as you are to share your own findings.” In order to make this lofty aim a reality, reviewers’ contributions would need to be recognized. “You need something more nuanced, and more qualitative,” says Price. “For example, maybe you gather reputation points from your community online.” Translating such metrics into tangible benefits up the food chain – hirings, tenure decisions, awards – is a broader community shift that will no doubt take time.
A more iterative peer review process could allow the community to better police faulty methods by crowdsourcing their evaluation. “90% of scientific studies are not reproducible,” claims Price; a problem that is exacerbated by the strong bias toward positive results. Journals may be unlikely to publish methodological refutations, but a flurry of well-supported comments attached to a paper online could convince the researchers to marshal more convincing evidence. Typically, this sort of feedback cycle takes years….”