This algorithm can predict a revolution


Russell Brandom at the Verge: “For students of international conflict, 2013 provided plenty to examine. There was civil war in Syria, ethnic violence in China, and riots to the point of revolution in Ukraine. For those working at Duke University’s Ward Lab, all specialists in predicting conflict, the year looks like a betting sheet, full of predictions that worked and others that didn’t pan out.

Guerrilla campaigns intensified, proving out the prediction

When the lab put out their semiannual predictions in July, they gave Paraguay a 97 percent chance of insurgency, largely based on reports of Marxist rebels. The next month, guerrilla campaigns intensified, proving out the prediction. In the case of China’s armed clashes between Uighurs and Hans, the models showed a 33 percent chance of violence, even as the cause of each individual flare-up was concealed by the country’s state-run media. On the other hand, the unrest in Ukraine didn’t start raising alarms until the action had already started, so the country was left off the report entirely.

According to Ward Lab’s staff, the purpose of the project isn’t to make predictions but to test theories. If a certain theory of geopolitics can predict an uprising in Ukraine, then maybe that theory is onto something. And even if these specialists could predict every conflict, it would only be half the battle. “It’s a success only if it doesn’t come at the cost of predicting a lot of incidents that don’t occur,” says Michael D. Ward, the lab’s founder and chief investigator, who also runs the blog Predictive Heuristics. “But it suggests that we might be on the right track.”

If a certain theory of geopolitics can predict an uprising in Ukraine, maybe that theory is onto something

Forecasting the future of a country wasn’t always done this way. Traditionally, predicting revolution or war has been a secretive project, for the simple reason that any reliable prediction would be too valuable to share. But as predictions lean more on data, they’ve actually become harder to keep secret, ushering in a new generation of open-source prediction models that butt against the siloed status quo.

Will this country’s government face an acute existential threat in the next six months?

The story of automated conflict prediction starts at the Defense Advance Research Projects Agency, known as the Pentagon’s R&D wing. In the 1990s, DARPA wanted to try out software-based approaches to anticipating which governments might collapse in the near future. The CIA was already on the case, with section chiefs from every region filing regular forecasts, but DARPA wanted to see if a computerized approach could do better. They looked at a simple question: will this country’s government face an acute existential threat in the next six months? When CIA analysts were put to the test, they averaged roughly 60 percent accuracy, so DARPA’s new system set the bar at 80 percent, looking at 29 different countries in Asia with populations over half a million. It was dubbed ICEWS, the Integrated Conflict Early Warning System, and it succeeded almost immediately, clearing 80 percent with algorithms built on simple regression analysis….

On the data side, researchers at Georgetown University are cataloging every significant political event of the past century into a single database called GDELT, and leaving the whole thing open for public research. Already, projects have used it to map the Syrian civil war and diplomatic gestures between Japan and South Korea, looking at dynamics that had never been mapped before. And then, of course, there’s Ward Lab, releasing a new sheet of predictions every six months and tweaking its algorithms with every development. It’s a mirror of the same open-vs.-closed debate in software — only now, instead of fighting over source code and security audits, it’s a fight over who can see the future the best.”

Big Data, Big New Businesses


Nigel Shaboldt and Michael Chui: “Many people have long believed that if government and the private sector agreed to share their data more freely, and allow it to be processed using the right analytics, previously unimaginable solutions to countless social, economic, and commercial problems would emerge. They may have no idea how right they are.

Even the most vocal proponents of open data appear to have underestimated how many profitable ideas and businesses stand to be created. More than 40 governments worldwide have committed to opening up their electronic data – including weather records, crime statistics, transport information, and much more – to businesses, consumers, and the general public. The McKinsey Global Institute estimates that the annual value of open data in education, transportation, consumer products, electricity, oil and gas, health care, and consumer finance could reach $3 trillion.

These benefits come in the form of new and better goods and services, as well as efficiency savings for businesses, consumers, and citizens. The range is vast. For example, drawing on data from various government agencies, the Climate Corporation (recently bought for $1 billion) has taken 30 years of weather data, 60 years of data on crop yields, and 14 terabytes of information on soil types to create customized insurance products.

Similarly, real-time traffic and transit information can be accessed on smartphone apps to inform users when the next bus is coming or how to avoid traffic congestion. And, by analyzing online comments about their products, manufacturers can identify which features consumers are most willing to pay for, and develop their business and investment strategies accordingly.

Opportunities are everywhere. A raft of open-data start-ups are now being incubated at the London-based Open Data Institute (ODI), which focuses on improving our understanding of corporate ownership, health-care delivery, energy, finance, transport, and many other areas of public interest.

Consumers are the main beneficiaries, especially in the household-goods market. It is estimated that consumers making better-informed buying decisions across sectors could capture an estimated $1.1 trillion in value annually. Third-party data aggregators are already allowing customers to compare prices across online and brick-and-mortar shops. Many also permit customers to compare quality ratings, safety data (drawn, for example, from official injury reports), information about the provenance of food, and producers’ environmental and labor practices.

Consider the book industry. Bookstores once regarded their inventory as a trade secret. Customers, competitors, and even suppliers seldom knew what stock bookstores held. Nowadays, by contrast, bookstores not only report what stock they carry but also when customers’ orders will arrive. If they did not, they would be excluded from the product-aggregation sites that have come to determine so many buying decisions.

The health-care sector is a prime target for achieving new efficiencies. By sharing the treatment data of a large patient population, for example, care providers can better identify practices that could save $180 billion annually.

The Open Data Institute-backed start-up Mastodon C uses open data on doctors’ prescriptions to differentiate among expensive patent medicines and cheaper “off-patent” varieties; when applied to just one class of drug, that could save around $400 million in one year for the British National Health Service. Meanwhile, open data on acquired infections in British hospitals has led to the publication of hospital-performance tables, a major factor in the 85% drop in reported infections.

There are also opportunities to prevent lifestyle-related diseases and improve treatment by enabling patients to compare their own data with aggregated data on similar patients. This has been shown to motivate patients to improve their diet, exercise more often, and take their medicines regularly. Similarly, letting people compare their energy use with that of their peers could prompt them to save hundreds of billions of dollars in electricity costs each year, to say nothing of reducing carbon emissions.

Such benchmarking is even more valuable for businesses seeking to improve their operational efficiency. The oil and gas industry, for example, could save $450 billion annually by sharing anonymized and aggregated data on the management of upstream and downstream facilities.

Finally, the move toward open data serves a variety of socially desirable ends, ranging from the reuse of publicly funded research to support work on poverty, inclusion, or discrimination, to the disclosure by corporations such as Nike of their supply-chain data and environmental impact.

There are, of course, challenges arising from the proliferation and systematic use of open data. Companies fear for their intellectual property; ordinary citizens worry about how their private information might be used and abused. Last year, Telefónica, the world’s fifth-largest mobile-network provider, tried to allay such fears by launching a digital confidence program to reassure customers that innovations in transparency would be implemented responsibly and without compromising users’ personal information.

The sensitive handling of these issues will be essential if we are to reap the potential $3 trillion in value that usage of open data could deliver each year. Consumers, policymakers, and companies must work together, not just to agree on common standards of analysis, but also to set the ground rules for the protection of privacy and property.”

Disinformation Visualization: How to lie with datavis


Mushon Zer-Aviv at School of Data: “Seeing is believing. When working with raw data we’re often encouraged to present it differently, to give it a form, to map it or visualize it. But all maps lie. In fact, maps have to lie, otherwise they wouldn’t be useful. Some are transparent and obvious lies, such as a tree icon on a map often represents more than one tree. Others are white lies – rounding numbers and prioritising details to create a more legible representation. And then there’s the third type of lie, those lies that convey a bias, be it deliberately or subconsciously. A bias that misrepresents the data and skews it towards a certain reading.

It all sounds very sinister, and indeed sometimes it is. It’s hard to see through a lie unless you stare it right in the face, and what better way to do that than to get our minds dirty and look at some examples of creative and mischievous visual manipulation.
Over the past year I’ve had a few opportunities to run Disinformation Visualization workshops, encouraging activists, designers, statisticians, analysts, researchers, technologists and artists to visualize lies. During these sessions I have used the DIKW pyramid (Data > Information > Knowledge > Wisdom), a framework for thinking about how data gains context and meaning and becomes information. This information needs to be consumed and understood to become knowledge. And finally when knowledge influences our insights and our decision making about the future it becomes wisdom. Data visualization is one of the ways to push data up the pyramid towards wisdom in order to affect our actions and decisions. It would be wise then to look at visualizations suspiciously.
DIKW
Centuries before big data, computer graphics and social media collided and gave us the datavis explosion, visualization was mostly a scientific tool for inquiry and documentation. This history gave the artform its authority as an integral part of the scientific process. Being a product of human brains and hands, a certain degree of bias was always there, no matter how scientific the process was. The effect of these early off-white lies are still felt today, as even our most celebrated interactive maps still echo the biases of the Mercator map projection, grounding Europe and North America on the top of the world, over emphasizing their size and perceived importance over the Global South. Our contemporary practices of programmatically data driven visualization hide both the human eyes and hands that produce them behind data sets, algorithms and computer graphics, but the same biases are still there, only they’re harder to decipher…”

Can We Balance Data Protection With Value Creation?


A “privacy perspective” by Sara Degli Esposti: “In the last few years there has been a dramatic change in the opportunities organizations have to generate value from the data they collect about customers or service users. Customers and users are rapidly becoming collections of “data points” and organizations can learn an awful lot from the analysis of this huge accumulation of data points, also known as “Big Data.”

Organizations are perhaps thrilled, dreaming about new potential applications of digital data but also a bit concerned about hidden risks and unintended consequences. Take, for example, the human rights protections placed on personal data by the EU.  Regulators are watching closely, intending to preserve the eight basic privacy principles without compromising the free flow of information.
Some may ask whether it’s even possible to balance the two.
Enter the Big Data Protection Project (BDPP): an Open University study on organizations’ ability to leverage Big Data while complying with EU data protection principles. The study represents a chance for you to contribute to, and learn about, the debate on the reform of the EU Data Protection Directive. It is open to staff with interests in data management or use, from all types of organizations, both for-profit and nonprofit, with interests in Europe.
Join us by visiting the study’s page on the Open University website. Participants will receive a report with all the results. The BDP is a scientific project—no commercial organization is involved—with implications relevant to both policy-makers and industry representatives..
What kind of legislation do we need to create that positive system of incentive for organizations to innovate in the privacy field?
There is no easy answer.
That’s why we need to undertake empirical research into actual information management practices to understand the effects of regulation on people and organizations. Legal instruments conceived with the best intentions can be ineffective or detrimental in practice. However, other factors can also intervene and motivate business players to develop procedures and solutions which go far beyond compliance. Good legislation should complement market forces in bringing values and welfare to both consumers and organizations.
Is European data protection law keeping its promise of protecting users’ information privacy while contributing to the flourishing of the digital economy or not? Will the proposed General Data Protection Regulation (GDPR) be able to achieve this goal? What would you suggest to do to motivate organizations to invest in information security and take information privacy seriously?
Let’s consider for a second some basic ideas such as the eight fundamental data protection principles: notice, consent, purpose specification and limitation, data quality, respect of data subjects’ rights, information security and accountability. Many of these ideas are present in the EU 1995 Data Protection Directive, the U.S. Fair Information Practice Principles (FIPPs) andthe 1980 OECD Guidelines. The fundamental question now is, should all these ideas be brought into the future, as suggested in the proposed new GDPR, orshould we reconsider our approach and revise some of them, as recommended in the 21st century version of the 1980 OECD Guidelines?
As you may know, notice and consent are often taken as examples of how very good intentions can be transformed into actions of limited importance. Rather than increase people’s awareness of the growing data economy, notice and consent have produced a tick-box tendency accompanied by long and unintelligible privacy policies. Besides, consent is rarely freely granted. Individuals give their consent in exchange for some product or service or as part of a job relationship. The imbalance between the two goods traded—think about how youngsters perceive not having access to some social media as a form of social exclusion—and the lack of feasible alternatives often make an instrument, such as the current use made of consent, meaningless.
On the other hand, a principle such as data quality, which has received very limited attention, could offer opportunities to policy-makers and businesses to reopen the debate on users’ control of their personal data. Having updated, accurate data is something very valuable for organizations. Data quality is also key to the success of many business models. New partnerships between users and organizations could be envisioned under this principle.
Finally, data collection limitation and purpose specification could be other examples of the divide between theory and practice: The tendency we see is that people and businesses want to share, merge and reuse data over time and to do new and unexpected things. Of course, we all want to avoid function creep and prevent any detrimental use of our personal data. We probably need new, stronger mechanisms to ensure data are used for good purposes.
Digital data have become economic assets these days. We need good legislation to stop the black market for personal data and open the debate on how each of us wants to contribute to, and benefit from, the data economy.”

Are bots taking over Wikipedia?


Kurzweil News: “As crowdsourced Wikipedia has grown too large — with more than 30 million articles in 287 languages — to be entirely edited and managed by volunteers, 12 Wikipedia bots have emerged to pick up the slack.

The bots use Wikidata — a free knowledge base that can be read and edited by both humans and bots — to exchange information between entries and between the 287 languages.

Which raises an interesting question: what portion of Wikipedia edits are generated by humans versus bots?

To find out (and keep track of other bot activity), Thomas Steiner of Google Germany has created an open-source application (and API): Wikipedia and Wikidata Realtime Edit Stats, described in an arXiv paper.
The percentages of bot vs. human edits as shown in the application is constantly changing.  A KurzweilAI snapshot on Feb. 20 at 5:19 AM EST showed an astonishing 42% of Wikipedia being edited by bots. (The application lists the 12 bots.)


Anonymous vs. logged-In humans (credit: Thomas Steiner)
The percentages also vary by language. Only 5% of English edits were by bots; but for Serbian pages, in which few Wikipedians apparently participate, 96% of edits were by bots.

The application also tracks what percentage of edits are by anonymous users. Globally, it was 25 percent in our snapshot and a surprising 34 percent for English — raising interesting questions about corporate and other interests covertly manipulating Wikipedia information.

Four Threats to American Democracy


Jared Diamond in Governance: “The U.S. government has spent the last two years wrestling with a series of crises over the federal budget and debt ceiling. I do not deny that our national debt and the prospect of a government shutdown pose real problems. But they are not our fundamental problems, although they are symptoms of them. Instead, our fundamental problems are four interconnected issues combining to threaten a breakdown of effective democratic government in the United States.
Why should we care? Let’s remind ourselves of the oft-forgotten reasons why democracy is a superior form of government (provided that it works), and hence why its deterioration is very worrisome. (Of course, I acknowledge that there are many countries in which democracy does not work, because of the lack of a national identity, of an informed electorate, or of both). The advantages of democracy include the following:

  • In a democracy, one can propose and discuss virtually any idea, even if it is initially unpalatable to the government. Debate may reveal the idea to be the best solution, whereas in a dictatorship the idea would not have gotten debated, and its virtues would not have been discovered.
  • In a democracy, citizens and their ideas get heard. Hence, without democracy, people are more likely to feel unheard and frustrated and to resort to violence.
  • Compromise is essential to a democracy. It enables us to avoid tyranny by the majority or (conversely) paralysis of government through vetoes exercised by a frustrated minority.
  • In modern democracies, all citizens can vote. Hence, government is motivated to invest in all citizens, who thereby receive the opportunity to become productive, rather than just a small dictatorial elite receiving that opportunity.

Why should we Americans keep reminding ourselves of those fundamental advantages of democracies? I would answer: not only in order to motivate ourselves to defend our democratic processes, but also because increasing numbers of Americans today are falling into the trap of envying the supposed efficiency of China’s dictatorship. Yes, it is true that dictatorships, by closing debate, can sometimes implement good policies faster than can the United States, as has China in quickly converting to lead-free gasoline and building a high-speed rail network. But dictatorships suffer from a fatal disadvantage. No one, in the 5,400 years of history of centralized government on all the continents, has figured out how to ensure that a dictatorship will embrace only good policies. Dictatorships also prevent the public debate that helps to avert catastrophic policies unparalleled in any large modern First World democracy—such as China’s quickly abolishing its educational system, sending its teachers out into the fields, and creating the world’s worst air pollution.
That is why democracy, given the prerequisites of an informed electorate and a basic sense of common interest, is the best form of government—at least, better than all the alternatives that have been tried, as Winston Churchill quipped. Our form of government is a big part of the explanation why the United States has become the richest and most powerful country in the world. Hence, an undermining of democratic processes in the United States means throwing away one of our biggest advantages. Unfortunately, that is what we are now doing, in four ways.
First, political compromise has been deteriorating in recent decades, and especially in the last five years. That deterioration can be measured as the increase in Senate rejections of presidential nominees whose approvals used to be routine, the increasing use of filibusters by the minority party, the majority party’s response of abolishing filibusters for certain types of votes, and the decline in number of laws passed by Congress to the lowest level of recent history. The reasons for this breakdown in political compromise, which seems to parallel increasing levels of nastiness in other areas of American life, remain debated. Explanations offered include the growth of television and then of the Internet, replacing face-to-face communication, and the growth of many narrowly partisan TV channels at the expense of a few broad-public channels. Even if these reasons hold a germ of truth, they leave open the question why these same trends operating in Canada and in Europe have not led to similar deterioration of political compromise in those countries as well.
Second, there are increasing restrictions on the right to vote, weighing disproportionately on voters for one party and implemented at the state level by the other party. Those obstacles include making registration to vote difficult and demanding that registered voters show documentation of citizenship when they present themselves at the polls. Of course, the United States has had a long history of denying voting rights to blacks, women, and other groups. But access to voting had been increasing in the last 50 years, so the recent proliferation of restrictions reverses that long positive trend. In addition to those obstacles preventing voter registration, the United States has by far the lowest election turnout among large First World democracies: under 60% of registered voters in most presidential elections, 40% for congressional elections, and 20% for the recent election for mayor of my city of Los Angeles. (A source of numbers for this and other comparisons that I shall cite is an excellent recent book by Howard Steven Friedman, The Measure of a Nation). And, while we are talking about elections, let’s not forget the astronomical recent increase in costs and durations of election campaigns, their funding by wealthy interests, and the shift in campaign pitches to sound bites. Those trends, unparalled in other large First World democracies, undermine the democratic prerequisite of a well-informed electorate.
A third contributor to the growing breakdown of democracy is our growing gap between rich and poor. Among our most cherished core values is our belief that the United States is a “land of opportunity,” and that we uniquely offer to our citizens the potential for rising from “rags to riches”—provided that citizens have the necessary ability and work hard. This is a myth. Income and wealth disparity in the United States (as measured by the Gini index of equality/inequality, and in other ways) is much higher in the United States than in any other large First World democracy. So is hereditary socioeconomic immobility, that is, the probability that a son’s relative income will just mirror his father’s relative income, and that sons of poor fathers will not become wealthy. Part of the reason for those depressing facts is inequality of educational opportunities. Children of rich Americans tend to receive much better educations than children of poor Americans.
That is bad for our economy, because it means that we are failing to develop a large fraction of our intellectual capital. It is also bad for our political stability, because poor parents who correctly perceive that their children are not being given the opportunity to succeed may express their resulting frustration in violence. Twice during my 47 years of residence in Los Angeles, in 1964 and 1993, frustration in poor areas of Los Angeles erupted into violence, lootings, and killings. In the 1993 riots, when police feared that rioters would spill into the wealthy suburb of Beverly Hills, all that the outnumbered police could do to protect Beverly Hills was to string yellow plastic police tape across major streets. As it turned out, the rioters did not try to invade Beverly Hills in 1993. But if present trends causing frustration continue, there will be more riots in Los Angeles and other American cities, and yellow plastic police tape will not suffice to contain the rioters.
The remaining contributor to the decline of American democracy is the decline of government investment in public purposes, such as education, infrastructure, and nonmilitary research and development. Large segments of the American populace deride government investment as “socialism.” But it is not socialism. On the contrary, it is one of the longest established functions of government. Ever since the rise of the first governments 5,400 years ago, governments have served two main functions: to maintain internal peace by monopolizing force, settling disputes, and forbidding citizens to resort to violence in order to settle disputes themselves; and to redistribute individual wealth for investing in larger aims—in the worst cases, enriching the elite; in the best cases, promoting the good of society as a whole. Of course, some investment is private, by wealthy individuals and companies expecting to profit from their investments. But many potential payoffs cannot attract private investment, either because the payoff is so far off in the future (such as the payoff from universal primary school education), or because the payoff is diffused over all of society rather than concentrated in areas profitable to the private investor (such as diffused benefits of municipal fire departments, roads, and broad education). Even the most passionate American supporters of small government do not decry as socialism the funding of fire departments, interstate highways, and public schools.

Developing an open government plan in the open


Tim Hughes at OGP: “New laws, standards, policies, processes and technologies are critical for opening up government, but arguably just as (if not more) important are new cultures, behaviours and ways of working within government and civil society.
The development of an OGP National Action Plan, therefore, presents a twofold opportunity for opening up government: On the one hand it should be used to deliver a set of robust and ambitious commitments to greater transparency, participation and accountability. But just as importantly, the process of developing a NAP should also be used to model new forms of open and collaborative working within government and civil society. These two purposes of a NAP should be mutually reinforcing. An open and collaborative process can – as was the case in the UK – help to deliver a more robust and ambitious action plan, which in turn can demonstrate the efficacy of working in the open.
You could even go one step further to say that the development of an National Action Plan should present an (almost) “ideal” vision of what open government in a country could look like. If governments aren’t being open as they’re developing an open government action plan, then there’s arguably little hope that they’ll be open elsewhere.
As coordinators of the UK OGP civil society network, this was on our mind at the beginning and throughout the development of the UK’s 2013-15 National Action Plan. Crucially, it was also on the minds of our counterparts in the UK Government. From the start, therefore, the process was developed with the intention that it should itself model the principles of open government. Members of the UK OGP civil society network met with policy officials from the UK Government on a regular basis to scope out and develop the action plan, and we published regular updates of our discussions and progress for others to follow and engage with. The process wasn’t without its challenges – and there’s still much more we can do to open it up further in the future – but it was successful in moving far beyond the typical model of government deciding, announcing and defending its intentions and in delivering an action plan with some strong and ambitious commitments.
One of the benefits of working in an open and collaborative way is that it enabled us to conduct and publish a full – warts and all – review of what went well and what didn’t. So, consider this is an invitation to delve into our successes and failures, a challenge to do it better and a request to help us to do so too. Head over to the UK OGP civil society network blog to read about what we did, and tell us what you think: http://www.opengovernment.org.uk/national-action-plan/story-of-the-uk-national-action-plan-2013-15/

Crowdsourcing and regulatory reviews: A new way of challenging red tape in British government?


New paper by Martin Lodge and Kai Wegrich in Regulation and Governance: “Much has been said about the appeal of digital government devices to enhance consultation on rulemaking. This paper explores the most ambitious attempt by the UK central government so far to draw on “crowdsourcing” to consult and act on regulatory reform, the “Red Tape Challenge.” We find that the results of this exercise do not represent any major change to traditional challenges to consultation processes. Instead, we suggest that the extensive institutional arrangements for crowdsourcing were hardly significant in informing actual policy responses: neither the tone of the crowdsourced comments, the direction of the majority views, nor specific comments were seen to matter. Instead, it was processes within the executive that shaped the overall governmental responses to this initiative. The findings, therefore, provoke wider debates about the use of social media in rulemaking and consultation exercises.”

The City as a Platform – Stripping out complexity and Making Things Happen


Emer Coleman: “The concept of data platforms has garnered a lot of coverage over the past few years and the City as a Platform is one that has wide traction in the “Smart City” space. It’s an idea that has been widely promulgated by service integrators and large consultancy firms. This idea has been adopted into the thinking of many cities in the UK, increasingly by local authorities who have both been forced by central government diktat to open their data and who are also engaging with many of the large private companies who sell infrastructure and capabilities and with whom they may have existing contractual arrangements.
Standard interpretations of city as platform usually involve the idea that the city authority will create the platform into which it will release its data. It then seeks the integration of API’s (both external and internal) into the platform so that theoretically the user can access that data via a unified City API on which developers can then create products and services.
Picture

Some local authorities seek to monetise access to this API while others see it as a mechanism for encouraging the development of new products and services that are of value to the state but which have been developed without direct additional investment by the state thereby generating public good from the public task of collecting and storing data.
This concept of city as platform integrated by local authorities appears at first glance to be a logical, linear and achievable goal but in my view completely misunderstands a number of key factors;
1. The evolution of the open data/big data market
2. Commercial and Technical realities
3. Governance and bureaucracy
I’ll explore these below…”

Open Data (Updated and Expanded)


As part of an ongoing effort to build a knowledge base for the field of opening governance by organizing and disseminating its learnings, the GovLab Selected Readings series provides an annotated and curated collection of recommended works on key opening governance topics. We start our series with a focus on Open Data. To suggest additional readings on this or any other topic, please email biblio@thegovlab.org.

Data and its uses for GovernanceOpen data refers to data that is publicly available for anyone to use and which is licensed in a way that allows for its re-use. The common requirement that open data be machine-readable not only means that data is distributed via the Internet in a digitized form, but can also be processed by computers through automation, ensuring both wide dissemination and ease of re-use. Much of the focus of the open data advocacy community is on government data and government-supported research data. For example, in May 2013, the US Open Data Policy defined open data as publicly available data structured in a way that enables the data to be fully discoverable and usable by end users, and consistent with a number of principles focused on availability, accessibility and reusability.

Selected Reading List (in alphabetical order)

Annotated Selected Reading List (in alphabetical order)
Fox, Mark S. “City Data: Big, Open and Linked.” Working Paper, Enterprise Integration Laboratory (2013). http://bit.ly/1bFr7oL.

  • This paper examines concepts that underlie Big City Data using data from multiple cities as examples. It begins by explaining the concepts of Open, Unified, Linked, and Grounded data, which are central to the Semantic Web. Fox then explore Big Data as an extension of Data Analytics, and provide case examples of good data analytics in cities.
  • Fox concludes that we can develop the tools that will enable anyone to analyze data, both big and small, by adopting the principles of the Semantic Web:
    • Data being openly available over the internet,
    • Data being unifiable using common vocabularies,
    • Data being linkable using International Resource Identifiers,
    • Data being accessible using a common data structure, namely triples,
    • Data being semantically grounded using Ontologies.

Foulonneau, Muriel, Sébastien Martin, and Slim Turki. “How Open Data Are Turned into Services?” In Exploring Services Science, edited by Mehdi Snene and Michel Leonard, 31–39. Lecture Notes in Business Information Processing 169. Springer International Publishing, 2014. http://bit.ly/1fltUmR.

  • In this chapter, the authors argue that, considering the important role the development of new services plays as a motivation for open data policies, the impact of new services created through open data should play a more central role in evaluating the success of open data initiatives.
  • Foulonneau, Martin and Turki argue that the following metrics should be considered when evaluating the success of open data initiatives: “the usage, audience, and uniqueness of the services, according to the changes it has entailed in the public institutions that have open their data…the business opportunity it has created, the citizen perception of the city…the modification to particular markets it has entailed…the sustainability of the services created, or even the new dialog created with citizens.”

Goldstein, Brett, and Lauren Dyson. Beyond Transparency: Open Data and the Future of Civic Innovation. 1 edition. (Code for America Press: 2013). http://bit.ly/15OAxgF

  • This “cross-disciplinary survey of the open data landscape” features stories from practitioners in the open data space — including Michael Flowers, Brett Goldstein, Emer Colmeman and many others — discussing what they’ve accomplished with open civic data. The book “seeks to move beyond the rhetoric of transparency for transparency’s sake and towards action and problem solving.”
  • The book’s editors seek to accomplish the following objectives:
    • Help local governments learn how to start an open data program
    • Spark discussion on where open data will go next
    • Help community members outside of government better engage with the process of governance
    • Lend a voice to many aspects of the open data community.
  • The book is broken into five sections: Opening Government Data, Building on Open Data, Understanding Open Data, Driving Decisions with Data and Looking Ahead.

Granickas, Karolis. “Understanding the Impact of Releasing and Re-using Open Government Data.” European Public Sector Information Platform, ePSIplatform Topic Report No. 2013/08, (2013). http://bit.ly/GU0Nx4.

  • This paper examines the impact of open government data by exploring the latest research in the field, with an eye toward enabling  an environment for open data, as well as identifying the benefits of open government data and its political, social, and economic impacts.
  • Granickas concludes that to maximize the benefits of open government data: a) further research is required that structure and measure potential benefits of open government data; b) “government should pay more attention to creating feedback mechanisms between policy implementers, data providers and data-re-users”; c) “finding a balance between demand and supply requires mechanisms of shaping demand from data re-users and also demonstration of data inventory that governments possess”; and lastly, d) “open data policies require regular monitoring.”

Gurin, Joel. Open Data Now: The Secret to Hot Startups, Smart Investing, Savvy Marketing, and Fast Innovation, (New York: McGraw-Hill, 2014). http://amzn.to/1flubWR.

  • In this book, GovLab Senior Advisor and Open Data 500 director Joel Gurin explores the broad realized and potential benefit of Open Data, and how, “unlike Big Data, Open Data is transparent, accessible, and reusable in ways that give it the power to transform business, government, and society.”
  • The book provides “an essential guide to understanding all kinds of open databases – business, government, science, technology, retail, social media, and more – and using those resources to your best advantage.”
  • In particular, Gurin discusses a number of applications of Open Data with very real potential benefits:
    • “Hot Startups: turn government data into profitable ventures;
    • Savvy Marketing: understanding how reputational data drives your brand;
    • Data-Driven Investing: apply new tools for business analysis;
    • Consumer Information: connect with your customers using smart disclosure;
    • Green Business: use data to bet on sustainable companies;
    • Fast R&D: turn the online world into your research lab;
    • New Opportunities: explore open fields for new businesses.”

Jetzek, Thorhildur, Michel Avital, and Niels Bjørn-Andersen. “Generating Value from Open Government Data.” Thirty Fourth International Conference on Information Systems, 5. General IS Topics 2013. http://bit.ly/1gCbQqL.

  • In this paper, the authors “developed a conceptual model portraying how data as a resource can be transformed to value.”
  • Jetzek, Avital and Bjørn-Andersen propose a conceptual model featuring four Enabling Factors (openness, resource governance, capabilities and technical connectivity) acting on four Value Generating Mechanisms (efficiency, innovation, transparency and participation) leading to the impacts of Economic and Social Value.
  • The authors argue that their research supports that “all four of the identified mechanisms positively influence value, reflected in the level of education, health and wellbeing, as well as the monetary value of GDP and environmental factors.”

Kassen, Maxat. “A promising phenomenon of open data: A case study of the Chicago open data project.Government Information Quarterly (2013). http://bit.ly/1ewIZnk.

  • This paper uses the Chicago open data project to explore the “empowering potential of an open data phenomenon at the local level as a platform useful for promotion of civic engagement projects and provide a framework for future research and hypothesis testing.”
  • Kassen argues that “open data-driven projects offer a new platform for proactive civic engagement” wherein governments can harness “the collective wisdom of the local communities, their knowledge and visions of the local challenges, governments could react and meet citizens’ needs in a more productive and cost-efficient manner.”
  • The paper highlights the need for independent IT developers to network in order for this trend to continue, as well as the importance of the private sector in “overall diffusion of the open data concept.”

Keen, Justin, Radu Calinescu, Richard Paige, John Rooksby. “Big data + politics = open data: The case of health care data in England.Policy and Internet 5 (2), (2013): 228–243. http://bit.ly/1i231WS.

  • This paper examines the assumptions regarding open datasets, technological infrastructure and access, using healthcare systems as a case study.
  • The authors specifically address two assumptions surrounding enthusiasm about Big Data in healthcare: the assumption that healthcare datasets and technological infrastructure are up to task, and the assumption of access to this data from outside the healthcare system.
  • By using the National Health Service in England as an example, the authors identify data, technology, and information governance challenges. They argue that “public acceptability of third party access to detailed health care datasets is, at best, unclear,” and that the prospects of Open Data depend on Open Data policies, which are inherently political, and the government’s assertion of property rights over large datasets. Thus, they argue that the “success or failure of Open Data in the NHS may turn on the question of trust in institutions.”

Kulk, Stefan and Bastiaan Van Loenen. “Brave New Open Data World?International Journal of Spatial Data Infrastructures Research, May 14, 2012. http://bit.ly/15OAUYR.

  • This paper examines the evolving tension between the open data movement and the European Union’s privacy regulations, especially the Data Protection Directive.
  • The authors argue, “Technological developments and the increasing amount of publicly available data are…blurring the lines between non-personal and personal data. Open data may not seem to be personal data on first glance especially when it is anonymised or aggregated. However, it may become personal by combining it with other publicly available data or when it is de-anonymised.”

Kundra, Vivek. “Digital Fuel of the 21st Century: Innovation through Open Data and the Network Effect.” Joan Shorenstein Center on the Press, Politics and Public Policy, Harvard College: Discussion Paper Series, January 2012, http://hvrd.me/1fIwsjR.

  • In this paper, Vivek Kundra, the first Chief Information Officer of the United States, explores the growing impact of open data, and argues that, “In the information economy, data is power and we face a choice between democratizing it and holding on to it for an asymmetrical advantage.”
  • Kundra offers four specific recommendations to maximize the impact of open data: Citizens and NGOs must demand open data in order to fight government corruption, improve accountability and government services; Governments must enact legislation to change the default setting of government to open, transparent and participatory; The press must harness the power of the network effect through strategic partnerships and crowdsourcing to cut costs and provide better insights; and Venture capitalists should invest in startups focused on building companies based on public sector data.

Noveck, Beth Simone and Daniel L. Goroff. “Information for Impact: Liberating Nonprofit Sector Data.” The Aspen Institute Philanthropy & Social Innovation Publication Number 13-004. 2013. http://bit.ly/WDxd7p.

  • This report is focused on “obtaining better, more usable data about the nonprofit sector,” which encompasses, as of 2010, “1.5 million tax-exempt organizations in the United States with $1.51 trillion in revenues.”
  • Toward that goal, the authors propose liberating data from the Form 990, an Internal Revenue Service form that “gathers and publishes a large amount of information about tax-exempt organizations,” including information related to “governance, investments, and other factors not directly related to an organization’s tax calculations or qualifications for tax exemption.”
  • The authors recommend a two-track strategy: “Pursuing the longer-term goal of legislation that would mandate electronic filing to create open 990 data, and pursuing a shorter-term strategy of developing a third party platform that can demonstrate benefits more immediately.”

Robinson, David G., Harlan Yu, William P. Zeller, and Edward W. Felten, “Government Data and the Invisible Hand.” Yale Journal of Law & Technology 11 (2009), http://bit.ly/1c2aDLr.

  • This paper proposes a new approach to online government data that “leverages both the American tradition of entrepreneurial self-reliance and the remarkable low-cost flexibility of contemporary digital technology.”
  • “In order for public data to benefit from the same innovation and dynamism that characterize private parties’ use of the Internet, the federal government must reimagine its role as an information provider. Rather than struggling, as it currently does, to design sites that meet each end-user need, it should focus on creating a simple, reliable and publicly accessible infrastructure that ‘exposes’ the underlying data.”
Ubaldi, Barbara. “Open Government Data: Towards Empirical Analysis of Open Government Data Initiatives.” OECD Working Papers on Public Governance. Paris: Organisation for Economic Co-operation and Development, May 27, 2013. http://bit.ly/15OB6qP.

  • This working paper from the OECD seeks to provide an all-encompassing look at the principles, concepts and criteria framing open government data (OGD) initiatives.
  • Ubaldi also analyzes a variety of challenges to implementing OGD initiatives, including policy, technical, economic and financial, organizational, cultural and legal impediments.
  • The paper also proposes a methodological framework for evaluating OGD Initiatives in OECD countries, with the intention of eventually “developing a common set of metrics to consistently assess impact and value creation within and across countries.”

Worthy, Ben. “David Cameron’s Transparency Revolution? The Impact of Open Data in the UK.” SSRN Scholarly Paper. Rochester, NY: Social Science Research Network, November 29, 2013. http://bit.ly/NIrN6y.

  • In this article, Worthy “examines the impact of the UK Government’s Transparency agenda, focusing on the publication of spending data at local government level. It measures the democratic impact in terms of creating transparency and accountability, public participation and everyday information.”
  • Worthy’s findings, based on surveys of local authorities, interviews and FOI requests, are disappointing. He finds that:
    • Open spending data has led to some government accountability, but largely from those already monitoring government, not regular citizens.
    • Open Data has not led to increased participation, “as it lacks the narrative or accountability instruments to fully bring such effects.”
    • It has also not “created a new stream of information to underpin citizen choice, though new innovations offer this possibility. The evidence points to third party innovations as the key.
  • Despite these initial findings, “Interviewees pointed out that Open Data holds tremendous opportunities for policy-making. Joined up data could significantly alter how policy is made and resources targeted. From small scale issues e.g. saving money through prescriptions to targeting homelessness or health resources, it can have a transformative impact. “

Zuiderwijk, Anneke, Marijn Janssen, Sunil Choenni, Ronald Meijer and Roexsana Sheikh Alibaks. “Socio-technical Impediments of Open Data.” Electronic Journal of e-Government 10, no. 2 (2012). http://bit.ly/17yf4pM.

  • This paper to seeks to identify the socio-technical impediments to open data impact based on a review of the open data literature, as well as workshops and interviews.
  • The authors discovered 118 impediments across ten categories: 1) availability and access; 2) find-ability; 3) usability; 4) understandability; 5) quality; 6) linking and combining data; 7) comparability and compatibility; 8) metadata; 9) interaction with the data provider; and 10) opening and uploading.

Zuiderwijk, Anneke and Marijn Janssen. “Open Data Policies, Their Implementation and Impact: A Framework for Comparison.” Government Information Quarterly 31, no. 1 (January 2014): 17–29. http://bit.ly/1bQVmYT.

  • In this article, Zuiderwijk and Janssen argue that “currently there is a multiplicity of open data policies at various levels of government, whereas very little systematic and structured research [being] done on the issues that are covered by open data policies, their intent and actual impact.”
  • With this evaluation deficit in mind, the authors propose a new framework for comparing open data policies at different government levels using the following elements for comparison:
    • Policy environment and context, such as level of government organization and policy objectives;
    • Policy content (input), such as types of data not publicized and technical standards;
    • Performance indicators (output), such as benefits and risks of publicized data; and
    • Public values (impact).

To stay current on recent writings and developments on Open Data, please subscribe to the GovLab Digest.
Did we miss anything? Please submit reading recommendations to biblio@thegovlab.org or in the comments below.