A Modern Approach to Open Data


at the Sunlight Foundation blog: “Last year, a group of us who work daily with open government data — Josh Tauberer of GovTrack.us, Derek Willis at The New York Times, and myself — decided to stop each building the same basic tools over and over, and start building a foundation we could share.
noun_project_15212
We set up a small home at github.com/unitedstates, and kicked it off with a couple of projects to gather data on the people and work of Congress. Using a mix of automation and curation, they gather basic information from all over the government — THOMAS.gov, the House and Senate, the Congressional Bioguide, GPO’s FDSys, and others — that everyone needs to report, analyze, or build nearly anything to do with Congress.
Once we centralized this work and started maintaining it publicly, we began getting contributions nearly immediately. People educated us on identifiers, fixed typos, and gathered new data. Chris Wilson built an impressive interactive visualization of the Senate’s budget amendments by extending our collector to find and link the text of amendments.
This is an unusual, and occasionally chaotic, model for an open data project. github.com/unitedstates is a neutral space; GitHub’s permissions system allows many of us to share the keys, so no one person or institution controls it. What this means is that while we all benefit from each other’s work, no one is dependent or “downstream” from anyone else. It’s a shared commons in the public domain.
There are a few principles that have helped make the unitedstates project something that’s worth our time:…”

White House Expands Guidance on Promoting Open Data


NextGov: “White House officials have announced expanded technical guidance to help agencies make more data accessible to the public in machine-readable formats.
Following up on President Obama’s May executive order linking the pursuit of open data to economic growth, innovation and government efficiency, two budget and science office spokesmen on Friday published a blog post highlighting new instructions and answers to frequently asked questions.
Nick Sinai, deputy chief technology officer at the Office of Science and Technology Policy, and Dominic Sale, supervisory policy analyst at the Office of Management and Budget, noted that the policy now in place means that all “newly generated government data will be required to be made available in open, machine-readable formats, greatly enhancing their accessibility and usefulness, while ensuring privacy and security.”

What should we do about the naming deficit/surplus?


in mySociety Blog: “As I wrote in my last post, I am very concerned about the lack of comprehensible, consistent language to talk about the hugely diverse ways in which people are using the internet to bring about social and political change….My approach to finding an appropriate name was to look at the way that other internet industry sectors are named, so that I could choose a name that sits nicely next to very familiar sectoral labels….

Segmenting the Civic Power sector

Choosing a single sectoral name – Civic Power – is not really the point of this exercise. The real benefit would come from being able to segment the many projects within this sector so that they are more easy to compare and contrast.

Here is my suggested four part segmentation of the Civic Power sector…:

  1. Decision influencing organisations try to directly shape or change particular decisions made by powerful individuals or organisations.
  2. Regime changing organisations try to replace decision makers, not persuade them.
  3. Citizen Empowering organisations try to give people the resources and the confidence required to exert power for whatever purpose those people see fit, both now and in the future.
  4. Digital Government organisations try to improve the ways in which governments acquire and use computers and networks. Strictly speaking this is just a sub-category of ‘decision influencing organisation’, on a par with an environmental group or a union, but more geeky.”

See also: Open Government – What’s in a Name?

Smart Government and Big, Open Data: The Trickle-Up Effect


Anthony Townsend at the Future Now Blog: “As we grow numb to the daily headlines decrying the unimaginable scope of data being collected from Internet companies by the National Security Agency’s Prism program, its worth remembering that governments themselves also produce mountains of data too. Tabulations of the most recent U.S. census, conducted in 2010, involved billions of data points and trillions of calculations. Not surprisingly, it is probably safe to assume that the federal government is also the world’s largest spender on database software—its tab with just one company, market-leader Oracle, passed $700 million in 2012 alone. Government data isn’t just big in scope. It is deep in history—governments have been accumulating data for centuries. In 2006, the genealogical research site Ancestry.com imported 600 terabytes of data (about what Facebook collects in a single day!) from the first fifteen U.S. censuses (1790 to 1930).

But the vast majority of data collected by governments never sees the light of day. It sits squirreled away on servers, and is only rarely cross-referenced in ways that private sector companies do all the time to gain insights into what’s actually going on across the country, and emerging problems and opportunities. Yet as governments all around the world have realized, if shared safely with due precautions to protect individual privacy, in the hand of citizens all of this data could be a national civic monument of tremendous economic and social value.”

When Hacking Is Actually a Good Thing: The Civic Hacking Movement


, Founder and CEO, PublicStuff in Huffington Post: “Many people think of the word “hacking” in a pejorative sense, understanding it to mean malicious acts of breaking into secure systems and wreaking havoc with private information. Popular culture likes to propagate a particular image of the hacker: a fringe-type individual with highly specialized technical skills who does what he or she does out of malice and/or greed. And so to many of us the concept of “civic hacking” may seem like an oxymoron, for how can the word “civic,” defined by its associations with municipal government and citizen concerns, be linked to the activity of hacking? Here is where another definition of hacking comes in–one that is more commonly used by denizens of the information technology industries–basically, the process of fixing a problem. As Jake Levitas defined it on the Code for America blog, civic hacking is “people working together quickly and creatively to make their cities better for everyone.” Moreover, as Levitas points out, civic hacking does not necessarily involve computer expertise or specialized technical knowledge; rather, it is a collective effort made up of people who want to make things better for themselves and each other, whether it be an ordinary citizen or a programming prodigy. So how does it work?”

New Book: Untangling the Web


By Aleks Krotoski: “The World Wide Web is the most revolutionary innovation of our time. In the last decade, it has utterly transformed our lives. But what real effects is it having on our social world? What does it mean to be a modern family when dinner table conversations take place over smartphones? What happens to privacy when we readily share our personal lives with friends and corporations? Are our Facebook updates and Twitterings inspiring revolution or are they just a symptom of our global narcissism? What counts as celebrity, when everyone can have a following or be a paparazzo? And what happens to relationships when love, sex and hate can be mediated by a computer? Social psychologist Aleks Krotoski has spent a decade untangling the effects of the Web on how we work, live and play. In this groundbreaking book, she uncovers how much humanity has – and hasn’t – changed because of our increasingly co-dependent relationship with the computer. In Untangling the Web, she tells the story of how the network became woven in our lives, and what it means to be alive in the age of the Internet.” Blog: http://untanglingtheweb.tumblr.com/
 
 

Create a Crowd Competition That Works


Ahmad Ashkar in HBR Blog Network: “It’s no secret that people in business are turning to the crowd to solve their toughest challenges. Well-known sites like Kickstarter and Indiegogo allow people to raise money for new projects. Design platforms like Crowdspring and 99designs give people the tools needed to crowdsource graphic design ideas and feedback.
At the Hult Prize — a start-up accelerator that challenges Millennials to develop innovative social enterprises to solve our world’s most pressing issues (and rewards the top team with $1,000,000 in start-up capital) — we’ve learned that the crowd can also offer an unorthodox solution in developing innovative and disruptive ideas, particularly ones focused on tackling complex, large-scale social issues.
But to effectively harness the power of the crowd, you have to engage it carefully. Over the past four years, we’ve developed a well-defined set of principles that guide our annual “challenge,” (lauded by Bill Clinton in TIME magazine as one of the top five initiatives changing the world for the better) that produces original and actionable ideas to solve social issues.
Companies like Netflix, General Electric, and Proctor & Gamble have also started “challenging the crowd” and employing many of these principles to tackle their own business roadblocks. If you’re looking to spark disruptive and powerful ideas that benefit your company, follow these guidelines to launch an engaging competition:
1. Define the boundaries
2. Identify a specific and bold stretch target. …
3. Insist on low barriers to entry. …
4. Encourage teams and networks. …
5. Provide a toolkit. Once interested parties become participants in your challenge, provide tools to set them up for success. If you are working on a social problem, you can use IDEO’s human-centered design toolkit. If you have a private-sector challenge, consider posting it on an existing innovation platform. As an organizer, you don’t have to spend time recreating the wheel — use one of the many existing platforms and borrow materials from those willing to share.”

5 Big Data Projects That Could Impact Your Life


Mashable: “We reached out to a few organizations using information, both hand- and algorithm-collected, to create helpful tools for their communities. This is only a small sample of what’s out there — plenty more pop up each day, and as more information becomes public, the trend will only grow….
1. Transit Time NYC
Transit Time NYC, an interactive map developed by WNYC, lets New Yorkers click a spot in any of the city’s five boroughs for an estimate of subway or train travel times. To create it, WNYC lead developer Steve Melendez broke the city into 2,930 hexagons, then pulled data from open source itinerary platform OpenTripPlanner — the Wikipedia of mapping software — and coupled it with the MTA’s publicly downloadable subway schedule….
2. Twitter’s ‘Topography of Tweets
In a blog post, Twitter unveiled a new data visualization map that displays billions of geotagged tweets in a 3D landscape format. The purpose is to display, topographically, which parts of certain cities most people are tweeting from…
3. Homicide Watch D.C.
Homicide Watch D.C. is a community-driven data site that aims to cover every murder in the District of Columbia. It’s sorted by “suspect” and “victim” profiles, where it breaks down each person’s name, age, gender and race, as well as original articles reported by Homicide Watch staff…
4. Falling Fruit
Can you find a hidden apple tree along your daily bike commute? Falling Fruit can.
The website highlights overlooked or hidden edibles in urban areas across the world. By collecting public information from the U.S. Department of Agriculture, municipal tree inventories, foraging maps and street tree databases, the site has created a network of 615 types of edibles in more than 570,000 locations. The purpose is to remind urban dwellers that agriculture does exist within city boundaries — it’s just more difficult to find….
5. AIDSvu
AIDSVu is an interactive map that illustrates the prevalence of HIV in the United States. The data is pulled from the U.S. Center for Disease Control’s national HIV surveillance reports, which are collected at both state and county levels each year…”

9 models to scale open data – past, present and future


Open Knowledge Foundation Blog: “The possibilities of open data have been enthralling us for 10 years…But that excitement isn’t what matters in the end. What matters is scale – which organisational structures will make this movement explode?  This post quickly and provocatively goes through some that haven’t worked (yet!) and some that have.
Ones that are working now
1) Form a community to enter in new data. Open Street Map and MusicBrainz are two big examples. It works as the community is the originator of the data. That said, neither has dominated its industry as much as I thought they would have by now.
2) Sell tools to an upstream generator of open data. This is what CKAN does for central Governments (and the new ScraperWiki CKAN tool helps with). It’s what mySociety does, when selling FixMyStreet installs to local councils, thereby publishing their potholes as RSS feeds.
3) Use open data (quietly). Every organisation does this and never talks about it. It’s key to quite old data resellers like Bloomberg. It is what most of ScraperWiki’s professional services customers ask us to do. The value to society is enormous and invisible. The big flaw is that it doesn’t help scale supply of open data.
4) Sell tools to downstream users. This isn’t necessarily open data specific – existing software like spreadsheets and Business Intelligence can be used with open or closed data. Lots of open data is on the web, so tools like the new ScraperWiki which work well with web data are particularly suited to it.
Ones that haven’t worked
5) Collaborative curation ScraperWiki started as an audacious attempt to create an open data curation community, based on editing scraping code in a wiki. In its original form (now called ScraperWiki Classic) this didn’t scale. …With a few exceptions, notably OpenCorporates, there aren’t yet open data curation projects.
6) General purpose data marketplaces, particularly ones that are mainly reusing open data, haven’t taken off. They might do one day, however I think they need well-adopted higher level standards for data formatting and syncing first (perhaps something like dat, perhaps something based on CSV files).
Ones I expect more of in the future
These are quite exciting models which I expect to see a lot more of.
7) Give labour/money to upstream to help them create better data. This is quite new. The only, and most excellent, example of it is the UK’s National Archive curating the Statute Law Database. They do the work with the help of staff seconded from commercial legal publishers and other parts of Government.
It’s clever because it generates money for upstream, which people trust the most, and which has the most ability to improve data quality.
8) Viral open data licensing. MySQL made lots of money this way, offering proprietary dual licenses of GPLd software to embedded systems makers. In data this could use OKFN’s Open Database License, and organisations would pay when they wanted to mix the open data with their own closed data. I don’t know anyone actively using it, although Chris Taggart from OpenCorporates mentioned this model to me years ago.
9) Corporations release data for strategic advantage. Companies are starting to release their own data for strategic gain. This is very new. Expect more of it.”

Next.Data.gov


Nick Sinai at the White House Blog: “Today, we’re excited to share a sneak preview of a new design for Data.gov, called Next.Data.gov. The upgrade builds on the President’s May 2013 Open Data Executive Order that aims to fuse open-data practices into the Federal Government’s DNA. Next.Data.gov is far from complete (think of it as a very early beta), but we couldn’t wait to share our design approach and the technical details behind it – knowing that we need your help to make it even better.  Here are some key features of the new design:
 

OSTP_nextdata_1 

Leading with Data: The Data.gov team at General Services Administration (GSA), a handful of Presidential Innovation Fellows, and OSTP staff designed Next.Data.Gov to put data first. The team studied the usage patterns on Data.gov and found that visitors were hungry for examples of how data are used. The team also noticed many sources, such as tweets and articles outside of Data.gov featuring Federal datasets in action. So Next.Data.gov includes a rich stream that enables each data community to communicate how its datasets are impacting companies and the public.

OSTP_nextdata_2 

In this dynamic stream, you’ll find blog posts, tweets, quotes, and other features that more fully showcase the wide range of information assets that exist within the vaults of government.
Powerful Search: The backend of Next.Data.gov is CKAN and is powered by Solr—a powerful search engine that will make it even easier to find relevant datasets online. Suggested search terms have been added to help users find (and type) things faster. Next.Data.gov will start to index datasets from agencies that publish their catalogs publicly, in line with the President’s Open Data Executive Order. The early preview launching today features datasets from the Department of Health and Human Services—one of the first Federal agencies to publish a machine-readable version of its data catalog.
Rotating Data Visualizations: Building on the theme of leading with data, even the  masthead-design for Next.Data.gov is an open-data-powered visualization—for now, it’s a cool U.S. Geological Survey earthquake plot showing the magnitude of earthquake measurements collected over the past week, around the globe.

OSTP_nextdata_3 

This particular visualization was built using D3.js. The visualization will be updated periodically to spotlight different ways open data is used and illustrated….
We encourage you to collaborate in the design process by creating pull requests or providing feedback via Quora or Twitter.”