Fifteen open data insights

15 insights into open data supply, use and impacts

15 insights into open data supply, use and impacts

(1) There are many gaps to overcome before open data availability, can lead to widespread effective use and impact. Open data can lead to change through a ‘domino effect’, or by creating ripples of change that gradually spread out. However, often many of the key ‘domino pieces’ are missing, and local political contexts limit the reach of ripples. Poor data quality, low connectivity, scarce technical skills, weak legal frameworks and political barriers may all prevent open data triggering sustainable change. Attentiveness to all the components of open data impact is needed when designing interventions.
(2) There is a frequent mismatch between open data supply and demand in developing countries. Counting datasets is a poor way of assessing the quality of an open data initiative. The datasets published on portals are often the datasets that are easiest to publish, not the datasets most in demand. Politically sensitive datasets are particularly unlikely to be published without civil society pressure. Sometimes the gap is on the demand side – as potential open data users often do not articulate demands for key datasets.
(3) Open data initiatives can create new spaces for civil society to pursue government accountability and effectiveness. The conversation around transparency and accountability that ideas of open data can support is as important as the datasets in some developing countries.
(4) Working on open data projects can change how government creates, prepares and uses its own data. The motivations behind an open data initiative shape how government uses the data itself. Civil society and entrepreneurs interacting with government through open data projects can help shape government data practices. This makes it important to consider which intermediaries gain insider roles shaping data supply.
(5) Intermediaries are vital to both the supply and the use of open data. Not all data needed for governance in developing countries comes from government. Intermediaries can create data, articulate demands for data, and help translate open data visions from political leaders into effective implementations. Traditional local intermediaries are an important source of information, in particular because they are trusted parties.
(6) Digital divides create data divides in both the supply and use of data. In some developing countries key data is not digitised, or a lack of technical staff has left data management patchy and inconsistent. Where Internet access is scarce, few citizens can have direct access to data or services built with it. Full access is needed for full empowerment, but offline intermediaries, including journalists and community radio stations, also play a vital role in bridging the gaps between data and citizens.
(7) Where information is already available and used, the shift to open data involves data evolution rather than data revolution. Many NGOs and intermediaries already access the information which is now becoming available as data. Capacity building should start from existing information and data practices in organisations, and should look for the step-by-step gains to be made from a data-driven approach.
(8) Officials’ fears about the integrity of data are a barrier to more machine-readable data being made available. The publication of data as PDF or in scanned copies is often down to a misunderstanding of how open data works. Only copies can be changed, and originals can be kept authoritative. Helping officials understand this may help increase the supply of data.
(9) Very few datasets are clearly openly licensed, and there is low understanding of what open licenses entail. There are mixed opinions on the importance of a focus on licensing in different contexts. Clear licenses are important to building a global commons of interoperable data, but may be less relevant to particular uses of data on the ground. In many countries wider conversation about licensing are yet to take place.
(10) Privacy issues are not on the radar of most developing country open data projects, although commercial confidentiality does arise as a reason preventing greater data transparency. Much state held data is collected either from citizens or from companies. Few countries in the ODDC study have weak or absent privacy laws and frameworks, yet participants in the studies raised few personal privacy considerations. By contrast, a lack of clarity, and officials’ concerns, about potential breaches of commercial confidentiality when sharing data gathered from firms was a barrier to opening data.
(11) There is more to open data than policies and portals. Whilst central open data portals act as a visible symbol of open data initiatives, a focus on portal building can distract attention from wider reforms. Open data elements can also be built on existing data sharing practices, and data made available through the locations where citizens, NGOs are businesses already go to access information.
(12) Open data advocacy should be aware of, and build upon, existing policy foundations in specific countries and sectors. Sectoral transparency policies for local government, budget and energy industry regulation, amongst others, could all have open data requirements and standards attached, drawing on existing mechanisms to secure sustainable supplies of relevant open data in developing countries. In addition, open data conversations could help make existing data collection and disclosure requirements fit better with the information and data demands of citizens.
(13) Open data is not just a central government issue: local government data, city data, and data from the judicial and legislative branches are all important. Many open data projects focus on the national level, and only on the executive branch. However, local government is closer to citizens, urban areas bring together many of the key ingredients for successful open data initiatives, and transparency in other branches of government is important to secure citizens democratic rights.
(14) Flexibility is needed in the application of definitions of open data to allow locally relevant and effective open data debates and advocacy to emerge. Open data is made up of various elements, including proactive publication, machine-readability and permissions to re-use. Countries at different stages of open data development may choose to focus on one or more of these, but recognising that adopting all elements at once could hinder progress. It is important to find ways to both define open data clearly, and to avoid a reductive debate that does not recognise progressive steps towards greater openness.
(15) There are many different models for an open data initiative: including top-down, bottom-up and sector-specific. Initiatives may also be state-led, civil society-led and entrepreneur-led in their goals and how they are implemented – with consequences for the resources and models required to make them sustainable. There is no one-size-fits-all approach to open data. More experimentation, evaluation and shared learning on the components, partners and processes for putting open data ideas into practice must be a priority for all who want to see a world where open-by-default data drives real social, political and economic change.
Using the Wisdom of the Crowd to Democratize Markets

David Weidner at the Wall Street Journal: “For years investors have largely depended on three sources to distill the relentless onslaught of information about public companies: the companies themselves, Wall Street analysts and the media.
Each of these has their strengths, but they may have even bigger weaknesses. Companies spin. Analysts have conflicts of interest. The financial media is under deadline pressure and ill-equipped to act as a catch-all watchdog.
But in recent years, the tech whizzes out of Silicon Valley have been trying to democratize the markets. In 2010 I wrote about an effort called Moxy Vote, an online system for shareholders to cast ballots in proxy contests. Moxy Vote had some initial success but ran into regulatory trouble and failed to gain traction.
Some newer efforts are more promising, mostly because they depend on users, or some form of crowdsourcing, for their content. Crowdsourcing is when a need is turned over to a large group, usually an online community, rather than traditional paid employees or outside providers…. is one. It was founded in 2011 by former trader Leigh Drogan, but recently has undergone some significant expansion, adding a crowd-sourced prediction for mergers and acquisitions. Estimize also boasts a track record. It claims it beats Wall Street analysts 65.9% of the time during earnings season. Like SeekingAlpha, Estimize does, however, lean heavily on pros or semi-pros. Nearly 5,000 of its contributors are analysts.
Closer to the social networking world there’s, a website and mobile app that aggregates what’s being said about individual stocks on social networks, blogs and other sources. It highlights trending stocks and links to chatter on social networks. (The site is owned by Cody Willard, a contributor to MarketWatch, which is owned by Dow Jones, the publisher of The Wall Street Journal.)
Perhaps the most intriguing startup is The site allows investors, analysts, average Joes — anyone, really — to annotate company releases. In that way, Two Margins potentially can tap the power of the crowd to provide a fourth source for the marketplace.
Two Margins, a startup funded by Bloomberg L.P.’s venture capital fund, borrows annotation technology that’s already in use on other sites such as and Participants can sign in with their Twitter or Facebook accounts and post to those networks from the site. (Dow Jones competes with Bloomberg in the provision of news and financial data.)
At this moment, Two Margins isn’t a game changer. Founders Gniewko Lubecki and Akash Kapur said the site is in a pre-beta phase, which is to say it’s sort of up and running and being constantly tweaked.
Right now there’s nothing close to the critical mass needed for an exhaustive look at company filings. There’s just a handful of users and less than a dozen company releases and filings available.
Still, in the first moments after Twitter Inc.’s earnings were released Tuesday, Two Margins’ most loyal users began to scour the release. “Looks like Twitter is getting significantly better at monetizing users,” wrote a user named “George” who had annotated the revenue line from the company’s financial statement. Another user, “Scott Paster,” noted Twitter’s stock option grants to executives were nearly as high as its reported loss.
“The sum is greater than it’s parts when you pull together a community of users,” Mr. Kapur said. “Widening access to these documents is one goal. The other goal is broadening the pool of knowledge that’s brought to bear on these documents.”
In the end, this new wave of tech-driven services may never capture enough users to make it into the investing mainstream. They all struggle with uninformed and inaccurate content especially if they gain critical mass. Vetting is a problem.
For that reasons, it’s hard to predict whether these new entries will flourish or even survive. That’s not a bad thing. The march of technology will either improve on the idea or come up with a new one.
Ultimately, technology is making possible what hasn’t been. That is, free discussion, access and analysis of information. Some may see it as a threat to Wall Street, which has always charged for expert analysis. Really, though, these efforts are good for markets, which pride themselves on being fair and transparent.
It’s not just companies that should compete, but ideas too.”

Quantifying the Interoperability of Open Government Datasets

Paper by Pieter Colpaert, Mathias Van Compernolle, Laurens De Vocht, Anastasia Dimou, Miel Vander Sande, Peter Mechant, Ruben Verborgh, and Erik Mannens, to be published in Computer: “Open Governments use the Web as a global dataspace for datasets. It is in the interest of these governments to be interoperable with other governments worldwide, yet there is currently no way to identify relevant datasets to be interoperable with and there is no way to measure the interoperability itself. In this article we discuss the possibility of comparing identifiers used within various datasets as a way to measure semantic interoperability. We introduce three metrics to express the interoperability between two datasets: the identifier interoperability, the relevance and the number of conflicts. The metrics are calculated from a list of statements which indicate for each pair of identifiers in the system whether they identify the same concept or not. While a lot of effort is needed to collect these statements, the return is high: not only relevant datasets are identified, also machine-readable feedback is provided to the data maintainer.”

The Responsive City: Engaging Communities Through Data-Smart Governance

New book by Stephen Goldsmith, and Susan P. Crawford: “The Responsive City: Engaging Communities Through Data-Smart Governance. The Responsive City is a guide to civic engagement and governance in the digital age that will help leaders link important breakthroughs in about technology and big data analytics with age-old lessons of small-group community input to create more agile, competitive, and economically resilient cities. Featuring vivid case-studies highlighting the work of individuals in New York, Boston, Rio de Janeiro, Stockholm, Indiana, and Chicago, the book provides a compelling model for the future of cities and states. The authors demonstrate how digital innovations will drive a virtuous cycle of responsiveness centered on “empowerment” : 1) empowering public employees with tools to both power their performance and to help them connect more personally to those they service, 2) empowering constituents to see and understand problems and opportunities faced by cities so that they can better engage in the life of their communities, and 3) empowering leaders to drive towards their missions and address the grand challenges confronting cities by harnessing the predictive power of cross-government Big Data, the book will help mayors, chief technology officers, city administrators, agency directors, civic groups and nonprofit leaders break out of current paradigms in order to collectively address civic problems. Co-authored by Stephen Goldsmith, former Mayor of Indianapolis, and current Director of the Innovations in Government Program at the Harvard Kennedy School and Susan Crawford, co-director of Harvard’s Berkman Center for Internet and Society.

The Responsive City highlights the ways in which leadership, empowered government employees, thoughtful citizens, and 21st century technology can combine to improve government operations and strengthen civic trust. It provides actionable advice while exploring topics like:

  • Visualizing service delivery and predicting improvement
  • Making the work of government employees more meaningful
  • Amplification and coordination of focused citizen engagement
  • Big Data in big cities – stories of surprising successes and enormous potential”


Policy bubbles: What factors drive their birth, maturity and death?

Moshe Maor at LSE Blog: “A policy bubble is a real or perceived policy overreaction that is reinforced by positive feedback over a relatively long period of time. This type of policy imposes objective and/or perceived social costs without producing offsetting objective and/or perceived benefits over a considerable length of time. A case in point is when government spending over a policy problem increases due to public demand for more policy while the severity of the problem decreases over an extended period of time. Another case is when governments raise ‘green’ or other standards due to public demand while the severity of the problem does not justify this move…
Drawing on insights from a variety of fields – including behavioural economics, psychology, sociology, political science and public policy – three phases of the life-cycle of a policy bubble may be identified: birth, maturity and death. A policy bubble may emerge when certain individuals perceive opportunities to gain from public policy or to exploit it by rallying support for the policy, promoting word-of-mouth enthusiasm and widespread endorsement of the policy, heightening expectations for further policy, and increasing demand for this policy….
How can one identify a policy bubble? A policy bubble may be identified by measuring parliamentary concerns, media concerns, public opinion regarding the policy at hand, and the extent of a policy problem, against the budget allocation to said policy over the same period, preferably over 50 years or more. Measuring the operation of different transmission mechanisms in emotional contagion and human herding, particularly the spread of social influence and feeling, can also work to identify a policy bubble.
Here, computer-aided content analysis of verbal and non-verbal communication in social networks, especially instant messaging, may capture emotional and social contagion. A further way to identify a policy bubble revolves around studying bubble expectations and individuals’ confidence over time by distributing a questionnaire to a random sample of the population, experts in the relevant policy sub-field, as well as decision makers, and comparing the results across time and nations.
To sum up, my interpretation of the process that leads to the emergence of policy bubbles allows for the possibility that different modes of policy overreaction lead to different types of human herding, thereby resulting in different types of policy bubbles. This interpretation has the added benefit of contributing to the explanation of economic, financial, technological and social bubbles as well”

Using predictive analytics and rapid-cycle evaluation to improve program design and results

The Myth of Everybody

at Medium: “What is the difference between “with” and “for”? “With” implies togetherness, a network: a larger group, possibly, a messier group, but a group (meaning 2 people+) nonetheless. Acting “with” others implies certain degrees of collaboration, collective action, coordination, and even unity. You run a three-legged race with your partner (or you’re going to fall). When you use the word “with” it means that, however many people are involved, whatever their individual roles, they’re acting as one — or at least, towards a shared goal.

By contrast, when we use the word “for” we center on the experience of individuals in a relationship, with one unit acting on behalf of or doing something to another. (“For another.”) In the “for” universe, there’s usually a receiver and a giver. There can be many people involved or few, but there are almost always actors and those acted upon. In a democracy like ours, where we have government of, by, and for the people, we understand that when we vote for an elected representative, they are then empowered to speak and act for us. To govern for us….but with our consent.

Representative democracy in action.

At least, that’s the way it’s described in textbooks. In reality, however, governance is awash with intermediaries: companies, contractors, public/private partnerships, lobbyists, NGOs, think tanks — organizations of people, formal and informal, that support, distribute, and sometimes do the work of our government for our government and for us. This (very simplified overview of our) system of proxies isn’t necessarily good or bad; it’s just the way we’ve structured things to work in the US.

Why? Well, because we govern in a “for” system. Because there are so many of us and our lives are interconnected. Because we balance majority rule with minority rights. Because of all the reasons you learned in social studies class (if you went to a US public high school) and because this is the way most of us believes society has to work.

But there are other ways.

— Take your hand off the “COMMUNIST” alarm. I’m talking about the “civic” revolution.

In the last 6 or so years, as the buzz around “Gov2.0” waned, obsession with “civic”-ness waxed. What “civic” means exactly, well, we’re all still figuring that out. Sure, there are official definitions that relate “civic” to all things local…and overlapping understandings of “civics” that lend the air of government involvement…but with increasing interest from folks in the tech and innovation sectors (and funders), the word has taken on new shape. Today, “civic” is the center of a Venn Diagram encircling notions commonly associated with “society,” “community,” “governance,” and public commons (or goods). The sheen of social impact, social responsibility, and “community-ness” — that’s what terms of art like “civic innovation,” “civic engagement,” “civic decisions,” “civic participation”, and “civic tech” are all trying to describe.

To be clear, it’s not that this intersection of societal something hasn’t been outlined before: language like “social” (see “social innovation”) and civil (see “civil society”) has been used to describe similar concepts for decades. “Civic” is just the newest coat of paint, its popularity driven in part by interest from NGOs, start-ups, digital strategists, and governing bodies attempting to bring new flavor and energy to long-standing questions, like

How can we make democracy work? What can we do to make the systems in place work better? And what do we need to change to make systems work better for everybody?…”

The Quiet Revolution: Open Data Is Transforming Citizen-Government Interaction

Maury Blackman at Wired: “The public’s trust in government is at an all-time low. This is not breaking news.
But what if I told you that just this past May, President Obama signed into law a bill that passed Congress with unanimous support. A bill that could fundamentally transform the way citizens interact with their government. This legislation could also create an entirely new, trillion-dollar industry right here in the U.S. It could even save lives.
On May 9th, the Digital Accountability and Transparency Act of 2014 (DATA Act) became law. There were very few headlines, no Rose Garden press conference.
I imagine most of you have never heard of the DATA Act. The bill with the nerdy name has the potential to revolutionize government. It requires federal agencies to make their spending data available in standardized, publicly accessible formats.  Supporters of the legislation included Tea Partiers and the most liberal Democrats. But the bill is only scratches the surface of what’s possible.
So What’s the Big Deal?
On his first day in Office, President Obama signed a memorandum calling for a more open and transparent government. The President wrote, “Openness will strengthen our democracy and promote efficiency and effectiveness in Government.” This was followed by the creation of, a one-stop shop for all government data. The site does not just include financial data, but also a wealth of other information related to education, public safety, climate and much more—all available in open and machine-readable format. This has helped fuel an international movement.
Tech minded citizens are building civic apps to bring government into the digital age; reporters are now more able to connect the dots easier, not to mention the billions of taxpayer dollars saved. And last year the President took us a step further. He signed an Executive Order making open government data the default option.
Cities and states have followed Washington’s lead with similar open data efforts on the local level. In San Francisco, the city’s Human Services Agency has partnered with Promptly; a text message notification service that alerts food stamp recipients (CalFresh) when they are at risk of being disenrolled from the program. This service is incredibly beneficial, because most do not realize any change in status, until they are in the grocery store checkout line, trying to buy food for their family.
Other products and services created using open data do more than just provide an added convenience—they actually have the potential to save lives. The PulsePoint mobile app sends text messages to citizens trained in CPR when someone in walking distance is experiencing a medical emergency that may require CPR. The app is currently available in almost 600 cities in 18 states, which is great. But shouldn’t a product this valuable be available to every city and state in the country?…”

This Exercise App Tracks Trends on How We Move In Different Cities

Mark Byrnes at CityLab: “An app designed to encourage exercise can also tell us a lot about the way different cities get from point A to B.
The app, called Human, runs in the background of your iPhone, automatically detecting activities like walking, cycling, running, and motorized transport. The point is to encourage you to exercise for at least 30 minutes a day.
Almost a year since Human launched (last August), its developers have released stunning visualization of all that movement: 7.5 million miles traveled by their app users so far.
On their site, you can look into the mobility data inside 30 different cities. Once you click on one, you’ll be greeted with a pie chart that shows the distribution of activity within that city lined up against a pie chart that shows the international average.
In the case of Amsterdam, its transportation clichés are verified. App users in the bike-loving city use two wheels way more than they use four. And they walk about as much as anywhere else:

Human then shows the paths traveled by their users. When it comes to Amsterdam, the results look almost exactly like the city’s entire street grid, no matter what physical activity is being shown:

Powerful new patent service shows every US invention, and a new view of R&D relationships

at GigaOm: “The website for the U.S. Patent Office website is famously clunky: searching and sorting patents can feel like playing an old Atari game, rather than watching innovation at work. But now a young inventor has come along with a tool to build a better patent office.
The service is called Trea, and was launched by Max Yuan, an engineer who received a patent of his own for a bike motor in 2007. After writing a tool to download patents related to his own invention, he expanded the process to slurp every patent and image in the USPTO database, and compile the information in a user-friendly interface.
Trea has been in beta for a while, but will formally launch on Wednesday. The tool not only provides an easy way to see what inventions a company or inventor is patenting, but also shows the fields in which they are most active. Here is a screenshot from Trea that shows what Apple has been up to in the last 12 months:
Trea screenshot of Apple inventions
Such information could be valuable to investors or to companies that want to use the filings as a way to track what might be in their competitors’ product pipelines. The Trea database also probes the USPTO for new filings, and can send alerts to subscribers. Yuan has also created a Twitter account just for new Apple filings.
Trea also draws on the patent database to display what Yuan calls a “unified knowledge graph” of relationships between inventors. Pictures, like the one below for IBM, show clusters of inventors and, at a broader level, the viral transmission of human ideas within a company:
Trea IBM screenshot
This type of information, gleaned from patent filings, could be valuable to corporate strategists, or to journalists, scholars or business historians. And making government websites more user-friendly, as is attempting to do with Securities and Exchange Commission filings, can certainly help people understand what their regulators are doing….”