Defining Open Data


Open Knowledge Foundation Blog: “Open data is data that can be freely used, shared and built-on by anyone, anywhere, for any purpose. This is the summary of the full Open Definition which the Open Knowledge Foundation created in 2005 to provide both a succinct explanation and a detailed definition of open data.
As the open data movement grows, and even more governments and organisations sign up to open data, it becomes ever more important that there is a clear and agreed definition for what “open data” means if we are to realise the full benefits of openness, and avoid the risks of creating incompatibility between projects and splintering the community.

Open can apply to information from any source and about any topic. Anyone can release their data under an open licence for free use by and benefit to the public. Although we may think mostly about government and public sector bodies releasing public information such as budgets or maps, or researchers sharing their results data and publications, any organisation can open information (corporations, universities, NGOs, startups, charities, community groups and individuals).

Read more about different kinds of data in our one page introduction to open data
There is open information in transport, science, products, education, sustainability, maps, legislation, libraries, economics, culture, development, business, design, finance …. So the explanation of what open means applies to all of these information sources and types. Open may also apply both to data – big data and small data – or to content, like images, text and music!
So here we set out clearly what open means, and why this agreed definition is vital for us to collaborate, share and scale as open data and open content grow and reach new communities.

What is Open?

The full Open Definition provides a precise definition of what open data is. There are 2 important elements to openness:

  • Legal openness: you must be allowed to get the data legally, to build on it, and to share it. Legal openness is usually provided by applying an appropriate (open) license which allows for free access to and reuse of the data, or by placing data into the public domain.
  • Technical openness: there should be no technical barriers to using that data. For example, providing data as printouts on paper (or as tables in PDF documents) makes the information extremely difficult to work with. So the Open Definition has various requirements for “technical openness,” such as requiring that data be machine readable and available in bulk.”…

Imagining Data Without Division


Thomas Lin in Quanta Magazine: “As science dives into an ocean of data, the demands of large-scale interdisciplinary collaborations are growing increasingly acute…Seven years ago, when David Schimel was asked to design an ambitious data project called the National Ecological Observatory Network, it was little more than a National Science Foundation grant. There was no formal organization, no employees, no detailed science plan. Emboldened by advances in remote sensing, data storage and computing power, NEON sought answers to the biggest question in ecology: How do global climate change, land use and biodiversity influence natural and managed ecosystems and the biosphere as a whole?…
For projects like NEON, interpreting the data is a complicated business. Early on, the team realized that its data, while mid-size compared with the largest physics and biology projects, would be big in complexity. “NEON’s contribution to big data is not in its volume,” said Steve Berukoff, the project’s assistant director for data products. “It’s in the heterogeneity and spatial and temporal distribution of data.”
Unlike the roughly 20 critical measurements in climate science or the vast but relatively structured data in particle physics, NEON will have more than 500 quantities to keep track of, from temperature, soil and water measurements to insect, bird, mammal and microbial samples to remote sensing and aerial imaging. Much of the data is highly unstructured and difficult to parse — for example, taxonomic names and behavioral observations, which are sometimes subject to debate and revision.
And, as daunting as the looming data crush appears from a technical perspective, some of the greatest challenges are wholly nontechnical. Many researchers say the big science projects and analytical tools of the future can succeed only with the right mix of science, statistics, computer science, pure mathematics and deft leadership. In the big data age of distributed computing — in which enormously complex tasks are divided across a network of computers — the question remains: How should distributed science be conducted across a network of researchers?
Part of the adjustment involves embracing “open science” practices, including open-source platforms and data analysis tools, data sharing and open access to scientific publications, said Chris Mattmann, 32, who helped develop a precursor to Hadoop, a popular open-source data analysis framework that is used by tech giants like Yahoo, Amazon and Apple and that NEON is exploring. Without developing shared tools to analyze big, messy data sets, Mattmann said, each new project or lab will squander precious time and resources reinventing the same tools. Likewise, sharing data and published results will obviate redundant research.
To this end, international representatives from the newly formed Research Data Alliance met this month in Washington to map out their plans for a global open data infrastructure.”

The transition towards transparency


Roland Harwood at the Open Data Institute Blog: “It’s a very exciting time for the field of open data, especially in the UK public sector which is arguably leading the world in this emerging discipline right now, in no small part thanks to the efforts to the Open Data Institute. There is a strong push to release public data and to explore new innovations that can be created as a result.
For instance, the Ordnance Survey have been leading the way with opening up half of their data for others to use, complemented by their GeoVation programme which provides support and incentive for external innovators to develop new products and services.
More recently the Technology Strategy Board have been working with the likes of NERC, Met Office, Environment Agency and other public agencies to help solve business problems using environmental data.
It goes without saying that data won’t leap up and create any value by itself any more than a pile of discarded parts outside a factory will assemble themselves into a car.   We’ve found that the secret of successful open data innovation is to be with people working to solve some specific problem.  Simply releasing the data is not enough. See below a summary of our Do’s and Don’ts of opening up data
Do…

  • Make sure data quality is high (ODI Certificates can help!)
  • Promote innovation using data sets. Transparency is only a means to an end
  • Enhance communication with external innovators
  • Make sure your co-creators are incentivised
  • Get organised, create a community around an issue
  • Pass on learnings to other similar organisations
  • Experiement – open data requires new mindsets and business models
  • Create safe spaces – Innovation Airlocks – to share and prototype with trusted partners
  • Be brave – people may do things with the data that you don’t like
  • Set out to create commercial or social value with data

Dont…

  • Just release data and expect people to understand or create with it. Publication is not the same as communication
  • Wait for data requests, put the data out first informally
  • Avoid challenges to current income streams
  • Go straight for the finished article, use rapid prototyping
  • Be put off by the tensions between confidentiality, data protection and publishing
  • Wait for the big budget or formal process but start big things with small amounts now
  • Be technology led, be business led instead
  • Expect the community to entirely self-manage
  • Restrict open data to the IT literate – create interdisciplinary partnerships
  • Get caught in the false dichotomy that is commercial vs. social

In summary we believe we need to assume openness as the default (for organisations that is, not individuals) and secrecy as the exception – the exact opposite to how most commercial organisations currently operate. …”

Making All Voices Count


Launch of Making All Voices Count: “Making All Voices Count is a global initiative that supports innovation, scaling-up, and research to deepen existing innovations and help harness new technologies to enable citizen engagement and government responsiveness….Solvable problems need not remain unsolved. Democratic systems in the 21st century continue to be inhibited by 19th century timescales, with only occasional opportunities for citizens to express their views formally, such as during elections. In this century, many citizens have access to numerous tools that enable them to express their views – and measure government performance – in real time.
For example, online reporting platforms enable citizens to monitor the election process by reporting intimidation, vote buying, bias and misinformation; access to mobile technology allows citizens to update water suppliers on gaps in service delivery; crisis information can be crowdsourced via eyewitness reports of violence, as reported by email and sms.
The rise of mobile communication, the installation of broadband and the fast-growing availability of open data, offer tremendous opportunities for data journalism and new media channels. They can inspire governments to develop new ways to fight corruption and respond to citizens efficiently, effectively and fairly. In short, developments in technology and innovation mean that government and citizens can interact like never before.
Making All Voices Count is about seizing this moment to strengthen our commitments to promote transparency, fight corruption, empower citizens, and harness the power of new technologies to make government more effective and accountable.
The programme specifically aims to address the following barriers that weaken the link between governments and citizens:

  • Citizens lack incentives: Citizens may not have the necessary incentives to express their feedback on government performance – due to a sense of powerlessness, distrust in the government, fear of retribution, or lack of reliable information
  • Governments lack incentives: At the same time, governments need incentives to respond to citizen input whenever possible and to leverage citizen participation. The government’s response to citizens should be reinforced by proactive, public communication.  This initiative will help create incentives for government to respond.  Where government responds effectively, citizens’ confidence in government performance and approval ratings are likely to increase
  • Governments lack the ability to translate citizen feedback into action: This could be due to anything from political constraints to a lack of skills and systems. Governments need better tools to effectively analyze and translate citizen input into information that will lead to solutions and shape resource allocation. Once captured, citizens’ feedback (on their experiences with government performance) must be communicated so as to engage both the government and the broader public in finding a solution.
  • Citizens lack meaningful opportunities: Citizens need greater access to better tools and know-how to easily engage with government in a way that results in government action and citizen empowerment”

Explore the world’s constitutions with a new online tool


Official Google Blog: “Constitutions are as unique as the people they govern, and have been around in one form or another for millennia. But did you know that every year approximately five new constitutions are written, and 20-30 are amended or revised? Or that Africa has the youngest set of constitutions, with 19 out of the 39 constitutions written globally since 2000 from the region?
The process of redesigning and drafting a new constitution can play a critical role in uniting a country, especially following periods of conflict and instability. In the past, it’s been difficult to access and compare existing constitutional documents and language—which is critical to drafters—because the texts are locked up in libraries or on the hard drives of constitutional experts. Although the process of drafting constitutions has evolved from chisels and stone tablets to pens and modern computers, there has been little innovation in how their content is sourced and referenced.
With this in mind, Google Ideas supported the Comparative Constitutions Project to build Constitute, a new site that digitizes and makes searchable the world’s constitutions. Constitute enables people to browse and search constitutions via curated and tagged topics, as well as by country and year. The Comparative Constitutions Project cataloged and tagged nearly 350 themes, so people can easily find and compare specific constitutional material. This ranges from the fairly general, such as “Citizenship” and “Foreign Policy,” to the very specific, such as “Suffrage and turnouts” and “Judicial Autonomy and Power.”
Our aim is to arm drafters with a better tool for constitution design and writing. We also hope citizens will use Constitute to learn more about their own constitutions, and those of countries around the world.”

Open Data 500 gives voice to companies using government data


data
Fedscoop: “Federal agencies have been working toward a Nov. 1 deadline to unlock their data, as mandated by an executive order issued in May. But what has yet to be examined is how useful those data sets have been to companies and the economic value they have created.
Enter the Open Data 500 – a project that gives companies the opportunity to provide feedback to government about which data sets are most useful and which type of data demand exists.
The initiative is part of a broader effort by the New York University’s Governance Lab’s research of how government can work more effectively with its constituents, said Joel Gurin, GovLab’s senior adviser and director of Open Data 500.
“We hope this will be a research project that illuminates the way government open data sets are being used by the private sector and help people gauge the economic impact and also help to make open data more effective, more useful,” he said.

Companies participating in Open Data 500 submit their responses via a survey to give insight into which data has been easiest to use and which type of data they would like to see made available. The survey also ranks agencies’ data sets on how useful they are.
What the project won’t do is score companies based on their use of federal data, but instead gives them a chance to interact with government and express which data they want.”

Open Data’s Road to Better Transit


Stephen Goldsmith in GovTech: “Data is everywhere. It now costs less to capture, store and process data than ever before, thanks to better technology and economies of scale. And more than ever, the public expects government to use data to improve its services. Increasingly, government’s problem is not capturing the data, but having sufficient resources to clean and analyze the information in order to address issues, improve performance and make informed decisions.
In particular, public transit not only produces an immense volume of data, but it also stands to benefit from good analysis in the form of streamlined operations and a better rider experience. More than 200 transit agencies worldwide — from Buffalo to Budapest — are well on their way. They are publishing their schedules, fares and station locations to Google’s TransitDataFeed in a common format and for free. Such information is called open data, which is any data that’s publicly shared.
Open data allows anyone to download and use the information for his or her purposes, particularly software developers who can use it to create mobile and Web-based applications. Google, for example, incorporates the information into its Maps application to help riders plan trips and learn about service updates across bus, rail and bike systems. Other third parties have built successful apps on top of open transit data.
Innovations like these allow transit agencies to leverage external expertise and resources, and have also reduced customer service costs and increased ridership levels. In fact, some members of the American Public Transportation Association believe that open data initiatives have catalyzed more innovation throughout the industry than any other factor in the last three decades….
In Philadelphia, the City Planning Commission is using text message surveying to capture the opinions of transit riders across the demographic spectrum to determine the usefulness of a proposed rapid transit line into downtown. Philadelphia uses the transit information to inform its comprehensive city plan, but this digital citizen survey mechanism, created by a company called Textizen, is a platform that can be used by any government that wants to solicit feedback or begin a dialog with its citizens.
In 2012, Dubuque, Iowa, collaborated with IBM to run a Smarter Travel pilot study. The pilot used a mobile app and RFIDs to collect anonymous travel data from volunteer transit riders. The city has already used the data to open a new late-night bus line for third-shift workers and college students, and by next year will incorporate data into more route planning decisions.”

Introducing Socrata’s Open Data Magazine: Open Innovation


“Socrata is dedicated to telling the story of open data as it evolves, which is why we have launched a quarterly magazine, “Open Innovation.”
As innovators push the open data movement forward, they are transforming government and public engagement at every level. With thousands of innovators all over the world – each with their own successes, advice, and ideas – there is a tremendous amount of story for us to tell.
The new magazine features articles, advice, infographics, and more dedicated exclusively to the open data movement. The first issue, Fall 2013, will cover topics such as:

  • What is a Chief Data Officer?
  • Who should be on your open data team?
  • How do you publish your first open data set?

It will also include four Socrata case studies and opinion pieces from some of the industry’s leading innovators…
The magazine is currently free to download or read online through the Socrata website. It is optimized for viewing on tablets and smart phones, with plans in the works to make the magazine available through the Kindle Fire and iTunes magazine stores.
Check out the first issue of Open Innovation at www.socrata.com/magazine.”

GovLab Seeks Open Data Success Stories


Wyatt Kash in InformationWeek: “A team of open government advocates, led by former White House aide Beth Novek, has launched a campaign to identify 500 examples of how freely available government data is being put to profitable use in the private sector.Open Data 500 is part of a broader effort by New York University’s Governance Lab (GovLab) to conduct the “first real, comprehensive study of the use of open government data in the private sector,” said Joel Gurin, founder of OpenDataNow.com and senior adviser at GovLab.
Novek, who served in the White House as the first U.S. deputy CTO and led the White House Open Government Initiative from 2009-2011, founded GovLab while also teaching at the MIT Media Lab and NYU’s Robert F. Wagner Graduate School of Public Service.
In an interview with InformationWeek Government, Gurin explained that the goal of GovLab, and the Open Data 500 project, is to show how technology and new uses of data can make government more effective, and create more of a partnership between government and the public. “We’re also trying to draw on more public expertise to solve government problems,” he said….
Gurin said Open Data 500 will primarily look at U.S.-based, revenue-producing companies or organizations where government data is a key resource for their business. While the GovLab will focus initially on the use of federal data, it will also look at cases where entrepreneurs are making use of state or local data, but in scalable fashion.
“This goes one step further than the datapaloozas” championed by U.S. CTO Todd Park to showcase tools developed by the private sector using government data. “We’re trying to show how we can make data sets even more impactful and useful.”
Gurin said the GovLab team hopes to complete the study by the end of this year. The team has already identified 150 companies as candidates. To submit your company for consideration, visit thegovlab.org/submit-your-company; to submit another company, visit thegovlab.org/open500

Here’s how the Recovery Act became a test case for open data


Andrea Peterson in the Washington Post: “Making sure that government money is spent efficiently and without fraud can be difficult. You need to collect the right data, get the information to the right people, and deal with the sheer volume of projects that need tracking. Open data make the job easier to draw comparisons across programs and agencies. And when data are released to the public, everyone can help be a government watchdog.
When President Obama was first elected in 2008, he promised transparency. Almost immediately after he was sworn into office, he had an opportunity to test that promise with the implementation of the Recovery Act. And it worked….
Recovery.gov used geospatial technology to “allow Americans to drill down to their zip codes exactly where government money was being spent in their neighborhood.” It’s this micro-level of attention that increased accountability, according to Devaney.
“The degree of transparency forced them to get it right because they didn’t want to be embarrassed by their neighbors who they knew were going to these Web sites and could see what they were doing with the money.”
As to the second question of what data to collect: “I finally put my foot down and said no more than 100 pieces of data,” Devaney recalls, “So naturally, we came up to 99.” Of course, even with limiting themselves to that number of data points, transparency and fraud prevention was a daunting task, with the 300,000 some grantees to keep tabs on.
But having those data points in an open format was what allowed investigators to use “sophisticated cyber-technology and software to review and analyze Recovery-related data and information for any possible concerns or issues.” And they were remarkably successful on that end. A status report in October, 2010 showed “less than 0.2 percent of all reported awards currently have active fraud investigations.” Indeed, for Devaney’s  tenure leading the board he says the level of fraud hovered somewhere below half of one percent of all awards.”