9 models to scale open data – past, present and future


Open Knowledge Foundation Blog: “The possibilities of open data have been enthralling us for 10 years…But that excitement isn’t what matters in the end. What matters is scale – which organisational structures will make this movement explode?  This post quickly and provocatively goes through some that haven’t worked (yet!) and some that have.
Ones that are working now
1) Form a community to enter in new data. Open Street Map and MusicBrainz are two big examples. It works as the community is the originator of the data. That said, neither has dominated its industry as much as I thought they would have by now.
2) Sell tools to an upstream generator of open data. This is what CKAN does for central Governments (and the new ScraperWiki CKAN tool helps with). It’s what mySociety does, when selling FixMyStreet installs to local councils, thereby publishing their potholes as RSS feeds.
3) Use open data (quietly). Every organisation does this and never talks about it. It’s key to quite old data resellers like Bloomberg. It is what most of ScraperWiki’s professional services customers ask us to do. The value to society is enormous and invisible. The big flaw is that it doesn’t help scale supply of open data.
4) Sell tools to downstream users. This isn’t necessarily open data specific – existing software like spreadsheets and Business Intelligence can be used with open or closed data. Lots of open data is on the web, so tools like the new ScraperWiki which work well with web data are particularly suited to it.
Ones that haven’t worked
5) Collaborative curation ScraperWiki started as an audacious attempt to create an open data curation community, based on editing scraping code in a wiki. In its original form (now called ScraperWiki Classic) this didn’t scale. …With a few exceptions, notably OpenCorporates, there aren’t yet open data curation projects.
6) General purpose data marketplaces, particularly ones that are mainly reusing open data, haven’t taken off. They might do one day, however I think they need well-adopted higher level standards for data formatting and syncing first (perhaps something like dat, perhaps something based on CSV files).
Ones I expect more of in the future
These are quite exciting models which I expect to see a lot more of.
7) Give labour/money to upstream to help them create better data. This is quite new. The only, and most excellent, example of it is the UK’s National Archive curating the Statute Law Database. They do the work with the help of staff seconded from commercial legal publishers and other parts of Government.
It’s clever because it generates money for upstream, which people trust the most, and which has the most ability to improve data quality.
8) Viral open data licensing. MySQL made lots of money this way, offering proprietary dual licenses of GPLd software to embedded systems makers. In data this could use OKFN’s Open Database License, and organisations would pay when they wanted to mix the open data with their own closed data. I don’t know anyone actively using it, although Chris Taggart from OpenCorporates mentioned this model to me years ago.
9) Corporations release data for strategic advantage. Companies are starting to release their own data for strategic gain. This is very new. Expect more of it.”

Next.Data.gov


Nick Sinai at the White House Blog: “Today, we’re excited to share a sneak preview of a new design for Data.gov, called Next.Data.gov. The upgrade builds on the President’s May 2013 Open Data Executive Order that aims to fuse open-data practices into the Federal Government’s DNA. Next.Data.gov is far from complete (think of it as a very early beta), but we couldn’t wait to share our design approach and the technical details behind it – knowing that we need your help to make it even better.  Here are some key features of the new design:
 

OSTP_nextdata_1 

Leading with Data: The Data.gov team at General Services Administration (GSA), a handful of Presidential Innovation Fellows, and OSTP staff designed Next.Data.Gov to put data first. The team studied the usage patterns on Data.gov and found that visitors were hungry for examples of how data are used. The team also noticed many sources, such as tweets and articles outside of Data.gov featuring Federal datasets in action. So Next.Data.gov includes a rich stream that enables each data community to communicate how its datasets are impacting companies and the public.

OSTP_nextdata_2 

In this dynamic stream, you’ll find blog posts, tweets, quotes, and other features that more fully showcase the wide range of information assets that exist within the vaults of government.
Powerful Search: The backend of Next.Data.gov is CKAN and is powered by Solr—a powerful search engine that will make it even easier to find relevant datasets online. Suggested search terms have been added to help users find (and type) things faster. Next.Data.gov will start to index datasets from agencies that publish their catalogs publicly, in line with the President’s Open Data Executive Order. The early preview launching today features datasets from the Department of Health and Human Services—one of the first Federal agencies to publish a machine-readable version of its data catalog.
Rotating Data Visualizations: Building on the theme of leading with data, even the  masthead-design for Next.Data.gov is an open-data-powered visualization—for now, it’s a cool U.S. Geological Survey earthquake plot showing the magnitude of earthquake measurements collected over the past week, around the globe.

OSTP_nextdata_3 

This particular visualization was built using D3.js. The visualization will be updated periodically to spotlight different ways open data is used and illustrated….
We encourage you to collaborate in the design process by creating pull requests or providing feedback via Quora or Twitter.”

Open Data Tools: Turning Data into ‘Actionable Intelligence’


Shannon Bohle in SciLogs: “My previous two articles were on open access and open data. They conveyed major changes that are underway around the globe in the methods by which scientific and medical research findings and data sets are circulated among researchers and disseminated to the public. I showed how E-science and ‘big data’ fit into the philosophy of science though a paradigm shift as a trilogy of approaches: deductive, empirical, and computational, which was pointed out, provides a logical extenuation of Robert Boyle’s tradition of scientific inquiry involving “skepticism, transparency, and reproducibility for independent verification” to the computational age…
This third article on open access and open data evaluates new and suggested tools when it comes to making the most of the open access and open data OSTP mandates. According to an article published in The Harvard Business Review’s “HBR Blog Network,” this is because, as its title suggests, “open data has  little value if people can’t use it.” Indeed, “the goal is for this data to become actionable intelligence: a launchpad for investigation, analysis, triangulation, and improved decision making at all levels.” Librarians and archivists have key roles to play in not only storing data, but packaging it for proper accessibility and use, including adding descriptive metadata and linking to existing tools or designing new ones for their users. Later, in a comment following the article, the author, Craig Hammer, remarks on the importance of archivists and international standards, “Certified archivists have always been important, but their skillset is crucially in demand now, as more and more data are becoming available. Accessibility—in the knowledge management sense—must be on par with digestibility / ‘data literacy’ as priorities for continuing open data ecosystem development. The good news is that several governments and multilaterals (in consultation with data scientists and – yep! – certified archivists) are having continuing ‘shared metadata’ conversations, toward the possible development of harmonized data standards…If these folks get this right, there’s a real shot of (eventual proliferation of) interoperability (i.e. a data platform from Country A can ‘talk to’ a data platform from Country B), which is the only way any of this will make sense at the macro level.”

The Science of Familiar Strangers: Society’s Hidden Social Network


The Physics arXiv Blog “We’ve all experienced the sense of being familiar with somebody without knowing their name or even having spoken to them. These so-called “familiar strangers” are the people we see every day on the bus on the way to work, in the sandwich shop at lunchtime, or in the local restaurant or supermarket in the evening.
These people are the bedrock of society and a rich source of social potential as neighbours, friends, or even lovers.
But while many researchers have studied the network of intentional links between individuals—using mobile-phone records, for example—little work has been on these unintentional links, which form a kind of hidden social network.
Today, that changes thanks to the work of Lijun Sun at the Future Cities Laboratory in Singapore and a few pals who have analysed the passive interactions between 3 million residents on Singapore’s bus network (about 55 per cent of the city’s population).  ”This is the first time that such a large network of encounters has been identied and analyzed,” they say.
The results are a fascinating insight into this hidden network of familiar strangers and the effects it has on people….
Perhaps the most interesting result involves the way this hidden network knits society together. Lijun and co say that the data hints that the connections between familiar strangers grows stronger over time. So seeing each other more often increases the chances that familiar strangers will become socially connected.
That’s a fascinating insight into the hidden social network in which we are all embedded. It’s important because it has implications for our understanding of the way things like epidemics can spread through cities.
Perhaps a more interesting is the insight it gives into how links form within communities and how these can strengthened. With the widespread adoption of smart cards on transport systems throughout the world, this kind of study can easily be repeated in many cities, which may help to tease apart some of the factors that make them so different.”
Ref: arxiv.org/abs/1301.5979: Understanding Metropolitan Patterns of Daily Encounters

A Smarter, More Innovative Government for the American People


Steve VanRoekel and Todd Park at the White House Blog: “This morning, the President held a meeting with his Cabinet and senior officials to lay out his vision for building a better, smarter, faster government over the course of his second term. During the meeting, the President directed Cabinet members and key officials in his Administration to build on the progress made over the first term, and he challenged us to improve government even further….
This morning, the President stated, “We need the brightest minds to help solve our biggest challenges. In this democracy, we, the people, realize this government is ours. It’s up to each and every one of us to make it work better. And we all have a stake in our success.” Read the President’s full remarks here, and see all the graphics from his speech below.”

The Management Agenda for Government Innovation

Open Government is an Open Conversation


Lisa Ellman and Hollie Russon Gilman at the White House Blog: “President Obama launched the first U.S. Open Government National Action Plan in September 2011, as part of the Nation’s commitment to the principles of the global Open Government Partnership. The Plan laid out twenty-six concrete steps the United States would take to promote public participation in government, increase transparency in government, and manage public resources more effectively.
A  year and a half later, we have fulfilled twenty-four of the Plan’s prescribed commitments—including launching the online We the People petition platform, which has been used by more than 9.6 million people, and unleashing thousands of government data resources as part of the Administration’s Open Data Initiatives.
We are proud of this progress, but recognize that there is always more work to be done to build a more efficient, effective, and transparent government. In that spirit, as part of our ongoing commitment to the international Open Government Partnership, the Obama Administration has committed to develop a second National Action Plan on Open Government.
To accomplish this task effectively, we’ll need all-hands-on-deck. That’s why we plan to solicit and incorporate your input as we develop the National Action Plan “2.0.”…
Over the next few months, we will continue to gather your thoughts. We will leverage online platforms such as Quora, Google+, and Twitter to communicate with the public and collect feedback.  We will meet with members of open government civil society organizations and other experts, to ensure all voices are brought to the table.  We will solicit input from Federal agencies on lessons learned from their unique experiences, and gather information about successful initiatives that could potentially be scaled across government.  And finally, we will canvass the international community for their diverse insights and innovative ideas.”

Citizen Science Profile: SeaSketch


Blog entry from the Commons Lab within the  Science and Technology Innovation Program of the Woodrow Wilson International Center for Scholars: “As part of the Commons Lab’s ongoing initiative to highlight the intersection of emerging technologies and citizen science, we present a profile of SeaSketch, a marine management software that makes complex spatial planning tools accessible to everyone. This was prepared with the gracious assistance of Will McClintock, director of the McClintock Lab.
The SeaSketch initiative highlights key components of successful citizen science projects. The end product is a result of an iterative process where the developers applied previous successes and learned from mistakes. The tool was designed to allow people without technical training to participate, expanding access to stakeholders. MarineMap had a quantifiable impact on California marine protected areas, increasing their size from 1 percent to 16 percent of the coastline. The subsequent version, SeaSketch, is uniquely suited to scale out worldwide, addressing coastal and land management challenges. By emphasizing iterative development, non-expert accessibility and scalability, SeaSketch offers a model of successful citizen science….
SeaSketch succeeded as a citizen science initiative by focusing on three project priorities:

  • Iterative Development: The current version of SeaSketch’s PGIS software is the result of seven years of trial and error. Doris and MarineMap helped the project team learn what worked and adjust accordingly. The final result would have been impossible without a sustained commitment to the project and regular product assessments.
  • Non-Expert Accessibility: GIS software is traditionally limited to those with technical expertise. SeaSketch was developed anticipating that stakeholders without GIS training would use the software. New features allow users to contribute spatial surveys, sharing their knowledge of the area to better inform planning. This ease of use means the project is outward facing: More people can participate, meaning the analyses better reflect community priorities.
  • Scalability: Although MarineMap was built specifically to guide the MLPA process, the concept is highly flexible. SeaSketch  is being used to support oceanic management issues worldwide, including in areas of international jurisdiction. The software can support planning with legal implications as well as cooperative agreements. SeaSketch’s project team believes it can also be used for freshwater and terrestrial management issues.”

Xeroc PARC Tackles Online Dating’s Biggest Conundrum


CertifeyeThe Physics arXiv Blog: “Online dating has changed the way people start relationships. In 2000, a few hundred thousand individuals were experimenting with online dating. Today, more than 40 million people have signed up to meet their dream man or woman online. That kind of success is reflected in the fact that this industry is currently worth some $1.9 billion in annual revenue.
Of course, nobody would claim that online dating is the perfect way to meet a mate. One problem in particular is whether to trust the information that a potential date has given. How do you know that this person isn’t being economical with the truth?…
The new approach is simple. The idea these guys have come up with is to use an app that connects to a person’s Facebook page (or other social network page) and then compare the information there with the information on the dating profile. If the data is the same, then it is certified. The beauty of this system is that the Facebook details are not open to external scrutiny—the app does not take, make public or display any information from the social network. It simply compares the information from the two sites.
Any discrepancy indicates that something, somewhere is wrong and the ambiguous details are not then certified….this process of certification gives users a greater sense of security because Facebook data is largely peer reviewed already.
Ref: arxiv.org/abs/1303.4155: Bootstrapping Trust in Online Dating: Social Verification of Online Dating Profiles”

How Open Data Can Fight Climate Change


New blog post by Joel Gurin, Founder and Editor, OpenDataNow.com: When people point to the value of Open Data from government, they often cite the importance of weather data from NOAA, the National Oceanic and Atmospheric Administration. That data has given us the Weather Channel, more accurate forecasts, and a number of weather-based companies. But the most impressive – and one of the best advertisements for government Open Data – may well be The Climate Corporation, headquartered in San Francisco.
Founded in 2006 under the name WeatherBill, The Climate Corporation was started to sell a better kind of weather insurance. But it’s grown into a company that could help farmers around the world plan around climate change, increase their crop yields, and become part of a new green revolution.
The company’s work is especially relevant in light of President Obama’s speech yesterday on new plans to fight climate change. We know that whatever we do to reduce carbon emissions now, we’ll still need to deal with changes that are already irreversible. The Climate Corporation’s work can be part of that solution…
The company has developed a new service, Climate.com, that is free to policyholders and available to others for a fee….
Their work may become part of a global Green Revolution 2.0. The U.S. Government’s satellite data doesn’t stop at the border: It covers the entire planet.  The Climate Corporation is now looking for ways to apply its work internationally, probably starting with Australia, which has relevant data of its own.
Start with insurance sales, end up by changing the world. The power of Open Data has never been clearer.”

FailureFest


Geoff Mulgan’s blog: “We’ve often discussed the role of failure in innovation – and have started running FailureFests and other devices to get practitioners talking honestly about what they learned from things that didn’t work. We all know how hard this is.
There’s a new book out by the guru of failure in engineering, Henry Petroski: To forgive design: understanding failure. He argues that the best way of achieving lasting success is by understanding failure and that a single failure may show ‘weaknesses in reasoning, knowledge, and performance that all the successful designs may not even hint at’. For him the best examples are collapsing bridges. Here’s a very different, but helpful, example of trying to extract some useful lessons from a well-intentioned project that didn’t quite work in a field very distant from bridges. It’s a reminder of why it’s so important that the new What Works centres are brave enough to set out clearly the ideas that they think have been tested and shown not to work – that may be just as useful as the recommendations on best or proven practice.
Of course it’s not enough to say we should celebrate failure. No organisation or system can do that. Instead there is an unavoidable ambiguity in the relationship between innovation and failure. On the one hand if you’re not failing often, you’re probably not taking enough creative risks. On the other hand, if you fail too much don’t expect to keep your job, or your funding. “