9 models to scale open data – past, present and future


Open Knowledge Foundation Blog: “The possibilities of open data have been enthralling us for 10 years…But that excitement isn’t what matters in the end. What matters is scale – which organisational structures will make this movement explode?  This post quickly and provocatively goes through some that haven’t worked (yet!) and some that have.
Ones that are working now
1) Form a community to enter in new data. Open Street Map and MusicBrainz are two big examples. It works as the community is the originator of the data. That said, neither has dominated its industry as much as I thought they would have by now.
2) Sell tools to an upstream generator of open data. This is what CKAN does for central Governments (and the new ScraperWiki CKAN tool helps with). It’s what mySociety does, when selling FixMyStreet installs to local councils, thereby publishing their potholes as RSS feeds.
3) Use open data (quietly). Every organisation does this and never talks about it. It’s key to quite old data resellers like Bloomberg. It is what most of ScraperWiki’s professional services customers ask us to do. The value to society is enormous and invisible. The big flaw is that it doesn’t help scale supply of open data.
4) Sell tools to downstream users. This isn’t necessarily open data specific – existing software like spreadsheets and Business Intelligence can be used with open or closed data. Lots of open data is on the web, so tools like the new ScraperWiki which work well with web data are particularly suited to it.
Ones that haven’t worked
5) Collaborative curation ScraperWiki started as an audacious attempt to create an open data curation community, based on editing scraping code in a wiki. In its original form (now called ScraperWiki Classic) this didn’t scale. …With a few exceptions, notably OpenCorporates, there aren’t yet open data curation projects.
6) General purpose data marketplaces, particularly ones that are mainly reusing open data, haven’t taken off. They might do one day, however I think they need well-adopted higher level standards for data formatting and syncing first (perhaps something like dat, perhaps something based on CSV files).
Ones I expect more of in the future
These are quite exciting models which I expect to see a lot more of.
7) Give labour/money to upstream to help them create better data. This is quite new. The only, and most excellent, example of it is the UK’s National Archive curating the Statute Law Database. They do the work with the help of staff seconded from commercial legal publishers and other parts of Government.
It’s clever because it generates money for upstream, which people trust the most, and which has the most ability to improve data quality.
8) Viral open data licensing. MySQL made lots of money this way, offering proprietary dual licenses of GPLd software to embedded systems makers. In data this could use OKFN’s Open Database License, and organisations would pay when they wanted to mix the open data with their own closed data. I don’t know anyone actively using it, although Chris Taggart from OpenCorporates mentioned this model to me years ago.
9) Corporations release data for strategic advantage. Companies are starting to release their own data for strategic gain. This is very new. Expect more of it.”

Digital Public Spaces


FutureEverything Publications: “This publication gathers a range of short explorations of the idea of the Digital Public Space. The central vision of the Digital Public Space is to give everyone everywhere unrestricted access to an open resource of culture and knowledge. This vision has emerged from ideas around building platforms for engagement around cultural archives to become something wider, which this publication is seeking to hone and explore.
This is the first publication to look at the emergence of the Digital Public Space. Contributors include some of the people who are working to make the Digital Public Space happen.
The Digital Public Spaces publication has been developed by FutureEverything working with Bill Thompson of the BBC and in association with The Creative Exchange.”

Understanding Smart Data Disclosure Policy Success: The Case of Green Button


New Paper by Djoko Sigit Sayogo and Theresa Pardo: “Open data policies are expected to promote innovations that stimulate social, political and economic change. In pursuit of innovation potential, open datahas expanded to wider environment involving government, business and citizens. The US government recently launched such collaboration through a smart data policy supporting energy efficiency called Green Button. This paper explores the implementation of Green Button and identifies motivations and success factors facilitating successful collaboration between public and private organizations to support smart disclosure policy. Analyzing qualitative data from semi-structured interviews with experts involved in Green Button initiation and implementation, this paper presents some key findings. The success of Green Button can be attributed to the interaction between internal and external factors. The external factors consist of both market and non-market drivers: economic factors, technology related factors, regulatory contexts and policy incentives, and some factors that stimulate imitative behavior among the adopters. The external factors create the necessary institutional environment for the Green Button implementation. On the other hand, the acceptance and adoption of Green Button itself is influenced by the fit of Green Button capability to the strategic mission of energy and utility companies in providing energy efficiency programs. We also identify the different roles of government during the different stages of Green Button implementation.”
[Recipient of Best Management/Policy Paper Award, dgo2013]

Next.Data.gov


Nick Sinai at the White House Blog: “Today, we’re excited to share a sneak preview of a new design for Data.gov, called Next.Data.gov. The upgrade builds on the President’s May 2013 Open Data Executive Order that aims to fuse open-data practices into the Federal Government’s DNA. Next.Data.gov is far from complete (think of it as a very early beta), but we couldn’t wait to share our design approach and the technical details behind it – knowing that we need your help to make it even better.  Here are some key features of the new design:
 

OSTP_nextdata_1 

Leading with Data: The Data.gov team at General Services Administration (GSA), a handful of Presidential Innovation Fellows, and OSTP staff designed Next.Data.Gov to put data first. The team studied the usage patterns on Data.gov and found that visitors were hungry for examples of how data are used. The team also noticed many sources, such as tweets and articles outside of Data.gov featuring Federal datasets in action. So Next.Data.gov includes a rich stream that enables each data community to communicate how its datasets are impacting companies and the public.

OSTP_nextdata_2 

In this dynamic stream, you’ll find blog posts, tweets, quotes, and other features that more fully showcase the wide range of information assets that exist within the vaults of government.
Powerful Search: The backend of Next.Data.gov is CKAN and is powered by Solr—a powerful search engine that will make it even easier to find relevant datasets online. Suggested search terms have been added to help users find (and type) things faster. Next.Data.gov will start to index datasets from agencies that publish their catalogs publicly, in line with the President’s Open Data Executive Order. The early preview launching today features datasets from the Department of Health and Human Services—one of the first Federal agencies to publish a machine-readable version of its data catalog.
Rotating Data Visualizations: Building on the theme of leading with data, even the  masthead-design for Next.Data.gov is an open-data-powered visualization—for now, it’s a cool U.S. Geological Survey earthquake plot showing the magnitude of earthquake measurements collected over the past week, around the globe.

OSTP_nextdata_3 

This particular visualization was built using D3.js. The visualization will be updated periodically to spotlight different ways open data is used and illustrated….
We encourage you to collaborate in the design process by creating pull requests or providing feedback via Quora or Twitter.”

Open Data Tools: Turning Data into ‘Actionable Intelligence’


Shannon Bohle in SciLogs: “My previous two articles were on open access and open data. They conveyed major changes that are underway around the globe in the methods by which scientific and medical research findings and data sets are circulated among researchers and disseminated to the public. I showed how E-science and ‘big data’ fit into the philosophy of science though a paradigm shift as a trilogy of approaches: deductive, empirical, and computational, which was pointed out, provides a logical extenuation of Robert Boyle’s tradition of scientific inquiry involving “skepticism, transparency, and reproducibility for independent verification” to the computational age…
This third article on open access and open data evaluates new and suggested tools when it comes to making the most of the open access and open data OSTP mandates. According to an article published in The Harvard Business Review’s “HBR Blog Network,” this is because, as its title suggests, “open data has  little value if people can’t use it.” Indeed, “the goal is for this data to become actionable intelligence: a launchpad for investigation, analysis, triangulation, and improved decision making at all levels.” Librarians and archivists have key roles to play in not only storing data, but packaging it for proper accessibility and use, including adding descriptive metadata and linking to existing tools or designing new ones for their users. Later, in a comment following the article, the author, Craig Hammer, remarks on the importance of archivists and international standards, “Certified archivists have always been important, but their skillset is crucially in demand now, as more and more data are becoming available. Accessibility—in the knowledge management sense—must be on par with digestibility / ‘data literacy’ as priorities for continuing open data ecosystem development. The good news is that several governments and multilaterals (in consultation with data scientists and – yep! – certified archivists) are having continuing ‘shared metadata’ conversations, toward the possible development of harmonized data standards…If these folks get this right, there’s a real shot of (eventual proliferation of) interoperability (i.e. a data platform from Country A can ‘talk to’ a data platform from Country B), which is the only way any of this will make sense at the macro level.”

Power of open data reveals global corporate networks


Open Data Institute: “The ODI today welcomed the move by OpenCorporates to release open data visualisations which show the global corporate networks of millions of businesses and the power of open data.
See the Maps
OpenCorporates, a company based at the ODI, has produced visuals using several sources, which it has published as open data for the first time:

  • Filings made by large domestic and foreign companies to the U.S. Securities and Exchange Commission
  • Banking data held by the National Information Center of the Federal Reserve System in the U.S.
  • Information about individual shareholders published by the official New Zealand corporate registry

Launched today, the visualisations are available through the main OpenCorporates website.”

Open Government is an Open Conversation


Lisa Ellman and Hollie Russon Gilman at the White House Blog: “President Obama launched the first U.S. Open Government National Action Plan in September 2011, as part of the Nation’s commitment to the principles of the global Open Government Partnership. The Plan laid out twenty-six concrete steps the United States would take to promote public participation in government, increase transparency in government, and manage public resources more effectively.
A  year and a half later, we have fulfilled twenty-four of the Plan’s prescribed commitments—including launching the online We the People petition platform, which has been used by more than 9.6 million people, and unleashing thousands of government data resources as part of the Administration’s Open Data Initiatives.
We are proud of this progress, but recognize that there is always more work to be done to build a more efficient, effective, and transparent government. In that spirit, as part of our ongoing commitment to the international Open Government Partnership, the Obama Administration has committed to develop a second National Action Plan on Open Government.
To accomplish this task effectively, we’ll need all-hands-on-deck. That’s why we plan to solicit and incorporate your input as we develop the National Action Plan “2.0.”…
Over the next few months, we will continue to gather your thoughts. We will leverage online platforms such as Quora, Google+, and Twitter to communicate with the public and collect feedback.  We will meet with members of open government civil society organizations and other experts, to ensure all voices are brought to the table.  We will solicit input from Federal agencies on lessons learned from their unique experiences, and gather information about successful initiatives that could potentially be scaled across government.  And finally, we will canvass the international community for their diverse insights and innovative ideas.”

City Data: Big, Open and Linked


Working Paper by Mark S. Fox (University of Toronto): “Cities are moving towards policymaking based on data. They are publishing data using Open Data standards, linking data from disparate sources, allowing the crowd to update their data with Smart Phone Apps that use Open APIs, and applying “Big Data” Techniques to discover relationships that lead to greater efficiencies.
One Big City Data example is from New York City (Schönberger & Cukier, 2013). Building owners were illegally converting their buildings into rooming houses that contained 10 times the number people they were designed for. These buildings posed a number of problems, including fire hazards, drugs, crime, disease and pest infestations. There are over 900,000 properties in New York City and only 200 inspectors who received over 25,000 illegal conversion complaints per year. The challenge was to distinguish nuisance complaints from those worth investigating where current methods were resulting in only 13% of the inspections resulting in vacate orders.
New York’s Analytics team created a dataset that combined data from 19 agencies including buildings, preservation, police, fire, tax, and building permits. By combining data analysis with expertise gleaned from inspectors (e.g., buildings that recently received a building permit were less likely to be a problem as they were being well maintained), the team was able to develop a rating system for complaints. Based on their analysis of this data, they were able to rate complaints such that in 70% of their visits, inspectors issued vacate orders; a fivefold increase in efficiency…
This paper provides an introduction to the concepts that underlie Big City Data. It explains the concepts of Open, Unified, Linked and Grounded data that lie at the heart of the Semantic Web. It then builds on this by discussing Data Analytics, which includes Statistics, Pattern Recognition and Machine Learning. Finally we discuss Big Data as the extension of Data Analytics to the Cloud where massive amounts of computing power and storage are available for processing large data sets. We use city data to illustrate each.”

Capitol Words


CaptureAbout Capitol Words: “For every day Congress is in session, Capitol Words visualizes the most frequently used words in the Congressional Record, giving you an at-a-glance view of which issues lawmakers address on a daily, weekly, monthly and yearly basis. Capitol Words lets you see what are the most popular words spoken by lawmakers on the House and Senate floor.

Methodology

The contents of the Congressional Record are downloaded daily from the website of the Government Printing Office. The GPO distributes the Congressional Record in ZIP files containing the contents of the record in plain-text format.

Each text file is parsed and turned into an XML document, with things like the title and speaker marked up. The contents of each file are then split up into words and phrases — from one word to five.

The resulting data is saved to a search engine. Capitol Words has data from 1996 to the present.”