Here’s how the Recovery Act became a test case for open data

Andrea Peterson in the Washington Post: “Making sure that government money is spent efficiently and without fraud can be difficult. You need to collect the right data, get the information to the right people, and deal with the sheer volume of projects that need tracking. Open data make the job easier to draw comparisons across programs and agencies. And when data are released to the public, everyone can help be a government watchdog.
When President Obama was first elected in 2008, he promised transparency. Almost immediately after he was sworn into office, he had an opportunity to test that promise with the implementation of the Recovery Act. And it worked…. used geospatial technology to “allow Americans to drill down to their zip codes exactly where government money was being spent in their neighborhood.” It’s this micro-level of attention that increased accountability, according to Devaney.
“The degree of transparency forced them to get it right because they didn’t want to be embarrassed by their neighbors who they knew were going to these Web sites and could see what they were doing with the money.”
As to the second question of what data to collect: “I finally put my foot down and said no more than 100 pieces of data,” Devaney recalls, “So naturally, we came up to 99.” Of course, even with limiting themselves to that number of data points, transparency and fraud prevention was a daunting task, with the 300,000 some grantees to keep tabs on.
But having those data points in an open format was what allowed investigators to use “sophisticated cyber-technology and software to review and analyze Recovery-related data and information for any possible concerns or issues.” And they were remarkably successful on that end. A status report in October, 2010 showed “less than 0.2 percent of all reported awards currently have active fraud investigations.” Indeed, for Devaney’s  tenure leading the board he says the level of fraud hovered somewhere below half of one percent of all awards.”


thomas levine: “There are loads of open data portals There’s even portal about data portals. And each of these portals has loads of datasets.
OpenPrism is my most recent attempt at understanding what is going on in all of these portals. Read on if you want to see why I made it, or just go to the site and start playing with it.

People don’t know much about open data

Nobody seems to know what is in the data portals. Many people know about datasets that are relevant to their work, municipality, &c., but nobody seems to know about the availability of data on broader topics, and nobody seems to have a good way of finding out what is available.
If someone does know any of this, he probably works for an open data portal. Still, he probably doesn’t know much about what is going on in other portals.

Naive search method

One difficulty in discovering open data is the search paradigm.
Open data portals approach searching data as if data were normal prose; your search terms are some keywords, a category, &c., and your results are dataset titles and descriptions.
There are other approaches. For example, AppGen searches for datasets with the same variables as each other, and the results are automatically generated app prototypes.

Siloed open data portals

Another issue is that people tend to use data from only one portal; they use their local government’s portals or their organizations’ portals.
Let me give you a couple examples of why this should maybe be different. Perhaps I’m considering making an app to help people find parking, and I want to see what parking lot data are available before I put much work into the app. Or maybe I want to find all of the data about sewer overflows so that I can expand my initiative to reduce water pollution.
OpenPrism is one small attempt at making it easier to search. Rather than going to all of the different portals and making a separate search for each portal, you type your search in one search bar, and you get results from a bunch of different Socrata, CKAN and Junar portals.”

From public innovation to social innovation in the public sector

A review of the literature by Victor Bekkers, Lars Tummers & William Voorberg: “Innovation is a recurring issue in public administration. In doing so it can be considered as a ‘magic concept’ that is been used to frame the necessary transformation of the public sector in order to improve not only its effectiveness and efficiency but also its legitimacy. Innovation is a concept that inspires people and policy makers because it offers the promise of radical change. As such the desire to innovate the public sector has a long history which is sometimes linked to reform programs in order to meet budget cutbacks, to meet the introduction of new management and governance ideologies (like New Public Management or Open Government) or to meet the introduction of new information and communication technologies (like e- government)…
Our starting point for studying social innovations in the public sector is that social innovations take place in a specific environment in which different actors can be distinguished. These actors collaborate with each other in terms of sharing relevant resources in order to develop and implement new ideas, new ways of working or new ways of organizing. This implies that characteristics of the environment can be seen as a relevant drivers and barriers. These characteristics can either function as a trigger for innovation while at the same time they can also function as relevant constraints. Based on an analysis of the literature, we have found that the following aspects of the environment could function as important drivers and barriers of innovation:
  • The social and political complexity of the environment in which public organizations operate which leads to specific demands that function as an external ‘trigger’ for innovation
  • The characteristics and degree of the legal culture in a country or policy sector
  • The type of governance and state tradition in the country or policy sector
  • The allocation of resources, resource dependency and the quality of relationships within the networks among the involved stakeholders”

Vint Cerf: Freedom and the Social Contract

Vinton G. Cerf in the Communications of the ACM: “The last several weeks (as of this writing) have been filled with disclosures of intelligence practices in the U.S. and elsewhere. Edward Snowden’s unauthorized release of highly classified information has stirred a great deal of debate about national security and the means used to preserve it.
In the midst of all this, I looked to Jean-Jacques Rousseau’s well-known 18th-century writings on the Social Contract (Du Contrat Social, Ou Principes du Droit Politique) for insight. Distilled and interpreted through my perspective, I took away several notions. One is that in a society, to achieve a degree of safety and stability, we as individuals give up some absolute freedom of action to what Rousseau called the sovereign will of the people. He did not equate this to government, which he argued was distinct and derived its power from the sovereign people.
I think it may be fair to say that most of us would not want to live in a society that had no limits to individual behavior. In such a society, there would be no limit to the potential harm an individual could visit upon others. In exchange for some measure of stability and safety, we voluntarily give up absolute freedom in exchange for the rule of law. In Rousseau’s terms, however, the laws must come from the sovereign people, not from the government. We approximate this in most modern societies creating representative government using public elections to populate the key parts of the government.”

(Appropriate) Big Data for Climate Resilience?

Amy Luers at the Stanford Social Innovation Review: “The answer to whether big data can help communities build resilience to climate change is yes—there are huge opportunities, but there are also risks.


  • Feedback: Strong negative feedback is core to resilience. A simple example is our body’s response to heat stress—sweating, which is a natural feedback to cool down our body. In social systems, feedbacks are also critical for maintaining functions under stress. For example, communication by affected communities after a hurricane provides feedback for how and where organizations and individuals can provide help. While this kind of feedback used to rely completely on traditional communication channels, now crowdsourcing and data mining projects, such as Ushahidi and Twitter Earthquake detector, enable faster and more-targeted relief.
  • Diversity: Big data is enhancing diversity in a number of ways. Consider public health systems. Health officials are increasingly relying on digital detection methods, such as Google Flu Trends or Flu Near You, to augment and diversify traditional disease surveillance.
  • Self-Organization: A central characteristic of resilient communities is the ability to self-organize. This characteristic must exist within a community (see the National Research Council Resilience Report), not something you can impose on it. However, social media and related data-mining tools (InfoAmazonia, Healthmap) can enhance situational awareness and facilitate collective action by helping people identify others with common interests, communicate with them, and coordinate efforts.


  • Eroding trust: Trust is well established as a core feature of community resilience. Yet the NSA PRISM escapade made it clear that big data projects are raising privacy concerns and possibly eroding trust. And it is not just an issue in government. For example, Target analyzes shopping patterns and can fairly accurately guess if someone in your family is pregnant (which is awkward if they know your daughter is pregnant before you do). When our trust in government, business, and communities weakens, it can decrease a society’s resilience to climate stress.
  • Mistaking correlation for causation: Data mining seeks meaning in patterns that are completely independent of theory (suggesting to some that theory is dead). This approach can lead to erroneous conclusions when correlation is mistakenly taken for causation. For example, one study demonstrated that data mining techniques could show a strong (however spurious) correlation between the changes in the S&P 500 stock index and butter production in Bangladesh. While interesting, a decision support system based on this correlation would likely prove misleading.
  • Failing to see the big picture: One of the biggest challenges with big data mining for building climate resilience is its overemphasis on the hyper-local and hyper-now. While this hyper-local, hyper-now information may be critical for business decisions, without a broader understanding of the longer-term and more-systemic dynamism of social and biophysical systems, big data provides no ability to understand future trends or anticipate vulnerabilities. We must not let our obsession with the here and now divert us from slower-changing variables such as declining groundwater, loss of biodiversity, and melting ice caps—all of which may silently define our future. A related challenge is the fact that big data mining tends to overlook the most vulnerable populations. We must not let the lure of the big data microscope on the “well-to-do” populations of the world make us blind to the less well of populations within cities and communities that have more limited access to smart phones and the Internet.”

Transparent Predictions

New Paper by Tal Zarsky: “Can human behavior be predicted? A broad variety of governmental initiatives are using computerized processes to try. Vast datasets of personal information enhance the ability to engage in these ventures and the appetite to push them forward. Governments have a distinct interest in automated individualized predictions to foresee unlawful actions. Novel technological tools, especially data-mining applications, are making governmental predictions possible. The growing use of predictive practices is generating serious concerns regarding the lack of transparency. Although echoed across the policy, legal, and academic debate, the nature of transparency, in this context, is unclear. Transparency flows from different, even competing, rationales, as well as very different legal and philosophical backgrounds. This Article sets forth a unique and comprehensive conceptual framework for understanding the role transparency must play as a regulatory concept in the crucial and innovative realm of automated predictive modeling.”

San Francisco To Test Online Participatory Budgeting “Taxpayers are sometimes the best people to decide how their money gets spent — sounds obvious, but usually we don’t have a direct say beyond who we elect. That’s changing for San Francisco residents.
It intends to be the first major US city to allow citizens to directly vote on portions of budget via the web. While details are still coming together, its plan is for each city district to vote on $100,000 in expenditures. Citizens will get to choose how the money is spent from a list of options, similar to the way they already vote from a list of ballot propositions. Topical experts will help San Francisco residents deliberate online.
So-called “participatory budgeting” first began in the festival city of Porto Alegre, Brazil, in 1989, and has slowly been expanding throughout the world. While major cities, such as Chicago and New York, have piloted participatory budgeting, they have not incorporated the modern features of digital voting and deliberation that are currently utilized in Brazil.
According to participatory budgeting expert and former White House technology fellow, Hollie Russon Gilman, San Francisco’s experiment will mark a “frontier” in American direct democracy.
This is significant because the Internet engenders a different type of democracy: not one of mere expression, but one of ideas. The net is good at surfacing the best ideas hidden within the wisdom of the crowds. Modern political scientists refer to this as “Epistemic Democracy,” derived from the Greek word for knowledge, epistēmē. Epistemic Democracy values citizens most for their expertise and builds tools to make policy making more informed.
For example, participatory budgeting has been found to reduce infant mortality rates in Brazil. It turns out that the mothers in Brazil had a better knowledge of why children were dying than health experts. Through participatory budgeting, they “channeled a larger fraction of their total budget to key investments in sanitation and health services,” writes Sonia Goncalves of King’s College London. “I also found that this change in the composition of municipal expenditures is associated with a pronounced reduction in the infant mortality rates for municipalities which adopted participatory budgeting.” [PDF]”

Three ways to think of the future…

Geoff Mulgan’s blog: “Here I suggest three complementary ways of thinking about the future which provide partial protection against the pitfalls.
The shape of the future
First, create your own composite future by engaging with the trends. There are many methods available for mapping the future – from Foresight to scenarios to the Delphi method.
Behind all are implicit views about the shapes of change. Indeed any quantitative exploration of the future uses a common language of patterns (shown in this table above) which summarises the fact that some things will go up, some go down, some change suddenly and some not at all.
All of us have implicit or explicit assumptions about these. But it’s rare to interrogate them systematically and test whether our assumptions about what fits in which category are right.
Let’s start with the J shaped curves. Many of the long-term trends around physical phenomena look J-curved: rising carbon emissions, water useage and energy consumption have been exponential in shape over the centuries. As we know, physical constraints mean that these simply can’t go on – the J curves have to become S shaped sooner or later, or else crash. That is the ecological challenge of the 21st century.
New revolutions
But there are other J curves, particularly the ones associated with digital technology.  Moore’s Law and Metcalfe’s Law describe the dramatically expanding processing power of chips, and the growing connectedness of the world.  Some hope that the sheer pace of technological progress will somehow solve the ecological challenges. That hope has more to do with culture than evidence. But these J curves are much faster than the physical ones – any factor that doubles every 18 months achieves stupendous rates of change over decades.
That’s why we can be pretty confident that digital technologies will continue to throw up new revolutions – whether around the Internet of Things, the quantified self, machine learning, robots, mass surveillance or new kinds of social movement. But what form these will take is much harder to predict, and most digital prediction has been unreliable – we have Youtube but not the Interactive TV many predicted (when did you last vote on how a drama should end?); relatively simple SMS and twitter spread much more than ISDN or fibre to the home.  And plausible ideas like the long tail theory turned out to be largely wrong.
If the J curves are dramatic but unusual, much more of the world is shaped by straight line trends – like ageing or the rising price of disease that some predict will take costs of healthcare up towards 40 or 50% of GDP by late in the century, or incremental advances in fuel efficiency, or the likely relative growth of the Chinese economy.
Also important are the flat straight lines – the things that probably won’t change in the next decade or two:  the continued existence of nation states not unlike those of the 19th century? Air travel making use of fifty year old technologies?
Great imponderables
If the Js are the most challenging trends, the most interesting ones are the ‘U’s’- the examples of trends bending:  like crime which went up for a century and then started going down, or world population that has been going up but could start going down in the later part of this century, or divorce rates which seem to have plateaued, or Chinese labour supply which is forecast to turn down in the 2020s.
No one knows if the apparently remorseless upward trends of obesity and depression will turn downwards. No one knows if the next generation in the West will be poorer than their parents. And no one knows if democratic politics will reinvent itself and restore trust. In every case, much depends on what we do. None of these trends is a fact of nature or an act of God.
That’s one reason why it’s good to immerse yourself in these trends and interrogate what shape they really are. Out of that interrogation we can build a rough mental model and generate our own hypotheses – ones not based on the latest fashion or bestseller but hopefully on a sense of what the data shows and in particular what’s happening to the deltas – the current rates of change of different phenomena.”

Open Access

Reports by the UK’s House of Commons, Business, Innovation and Skills Committee: “Open access refers to the immediate, online availability of peer reviewed research articles, free at the point of access (i.e. without subscription charges or paywalls). Open access relates to scholarly articles and related outputs. Open data (which is a separate area of Government policy and outside the scope of this inquiry) refers to the availability of the underlying research data itself. At the heart of the open access movement is the principle that publicly funded research should be publicly accessible. Open access expanded rapidly in the late twentieth century with the growth of the internet and digitisation (the transcription of data into a digital form), as it became possible to disseminate research findings more widely, quickly and cheaply.
Whilst there is widespread agreement that the transition to open access is essential in order to improve access to knowledge, there is a lack of consensus about the best route to achieve it. To achieve open access at scale in the UK, there will need to be a shift away from the dominant subscription-based business model. Inevitably, this will involve a transitional period and considerable change within the scholarly publishing market.
For the UK to transition to open access, an effective, functioning and competitive market in scholarly communications will be vital. The evidence we saw over the course of this inquiry shows that this is currently far from the case, with journal subscription prices rising at rates that are unsustainable for UK universities and other subscribers. There is a significant risk that the Government’s current open access policy will inadvertently encourage and prolong the dysfunctional elements of the scholarly publishing market, which are a major barrier to access.
See Volume I and  Volume II

From Networked Publics to Issue Publics: Reconsidering the Public/Private Distinction in Web Science

New paper by Andreas Birkbak: “As an increasing part of everyday life becomes connected with the web in many areas of the globe, the question of how the web mediates political processes becomes still more urgent. Several scholars have started to address this question by thinking about the web in terms of a public space. In this paper, we aim to make a twofold contribution towards the development of the concept of publics in web science. First, we propose that although the notion of publics raises a variety of issues, two major concerns continue to be user privacy and democratic citizenship on the web. Well-known arguments hold that the complex connectivity of the web puts user privacy at risk and enables the enclosure of public debate in virtual echo chambers. Our first argument is that these concerns are united by a set of assumptions coming from liberal political philosophy that are rarely made explicit. As a second contribution, this paper points towards an alternative way to think about publics by proposing a pragmatist reorientation of the public/private distinction in web science, away from seeing two spheres that needs to be kept separate, towards seeing the public and the private as something that is continuously connected. The theoretical argument is illustrated by reference to a recently published case study of Facebook groups, and future research agendas for the study of web-mediated publics are proposed.”