Stamen Design: “Last month, Stamen launched parks.stamen.com, a project we created in partnership with the Electric Roadrunner Lab, with the goal of revealing the diversity of social media activity that happens inside parks and other open spaces in California. If you haven’t already looked at the site, please go visit it now! Find your favorite park, or the parks that are nearest to you, or just stroll between random parks using the wander button. For more background about the goals of the project, read Eric’s blog post: A Conversation About California Parks.
In this post I’d like to describe some of the algorithms we use to collect the social media data that feeds the park pages. Currently we collect data from four social media platforms: Twitter, Foursquare, Flickr, and Instagram. We chose these because they all have public APIs (Application Programming Interfaces) that are easy to work with, and we expect they will provide a view into the different facets of each park, and the diverse communities who enjoy these parks. Each social media service creates its own unique geographies, and its own way of representing these parks. For example, the kinds of photos you upload to Instagram might be different from the photos you upload to Flickr. The way you describe experiences using Twitter might be different from the moments you document by checking into Foursquare. In the future we may add more feeds, but for now there’s a lot we can learn from these four.
Through the course of collecting data from these social network services, I also found that each service’s public API imposes certain constraints on our queries, producing their own intricate patterns. Thus, the quirks of how each API was written results in distinct and fascinating geometries. Also, since we are only interested in parks for this project, the process of culling non-park-related content further produces unusual and interesting patterns. Rural areas have large parks that cover huge areas, while cities have lots of (relatively) tiny parks, which creates its own challenges for how we query the APIs.
Broadly, we followed a similar approach for all the social media services. First, we grab the geocoded data from the APIs. This ignores any media that don’t have a latitude and longitude associated with them. In Foursquare, almost all checkins have a latitude and longitude, and for Flickr and Instagram most photos have a location associated with them. However, for Twitter, only around 1% of all tweets have geographic coordinates. But as we will see, even 1% still results in a whole lot of tweets!
After grabbing the social media data, we intersect it with the outlines of parks and open spaces in California, using polygons from the California Protected Areas Database maintained by GreenInfo Network. Everything that doesn’t intersect one of these parks, we throw away. The following maps represent the data as it looks before the filtering process.
But enough talking, let’s look at some maps!”
The Right Colors Make Data Easier To Read
Sharon Lin And Jeffrey Heer at HBR Blog: “What is the color of money? Of love? Of the ocean? In the United States, most people respond that money is green, love is red and the ocean is blue. Many concepts evoke related colors — whether due to physical appearance, common metaphors, or cultural conventions. When colors are paired with the concepts that evoke them, we call these “semantically resonant color choices.”
Artists and designers regularly use semantically resonant colors in their work. And in the research we conducted with Julie Fortuna, Chinmay Kulkarni, and Maureen Stone, we found they can be remarkably important to data visualization.
Consider these charts of (fictional) fruit sales:
The only difference between the charts is the color assignment. The left-hand chart uses colors from a default palette. The right-hand chart has been assigned semantically resonant colors. (In this case, the assignment was computed automatically using an algorithm that analyzes the colors in relevant images retrieved from Google Image Search using queries for each data category name.)
Now, try answering some questions about the data in each of these charts. Which fruit had higher sales: blueberries or tangerines? How about peaches versus apples? Which chart do you find easier to read?…
To make effective visualization color choices, you need to take a number of factors into consideration. To name just two: All the colors need to be suitably different from one another, for instance, so that readers can tell them apart – what’s called “discriminability.” You also need to consider what the colors look like to the color blind — roughly 8% of the U.S. male population! Could the colors be distinguished from one another if they were reprinted in black and white?
One easy way to assign semantically resonant colors is to use colors from an existing color palette that has been carefully designed for visualization applications (ColorBrewer offers some options) but assign the colors to data values in a way that best matches concept color associations. This is the basis of our own algorithm, which acquires images for each concept and then analyzes them to learn concept color associations. However, keep in mind that color associations may vary across cultures. For example, in the United States and many western cultures, luck is often associated with green (four-leaf clovers), while red can be considered a color of danger. However, in China, luck is traditionally symbolized with the color red.
…
Semantically resonant colors can reinforce perception of a wide range of data categories. We believe similar gains would likely be seen for other forms of visualizations like maps, scatterplots, and line charts. So when designing visualizations for presentation or analysis, consider color choice and ask yourself how well the colors resonate with the underlying data.”
The Open Data 500: Putting Research Into Action
TheGovLab Blog: “On April 8, the GovLab made two significant announcements. At an open data event in Washington, DC, I was pleased to announce the official launch of the Open Data 500, our study of 500 companies that use open government data as a key business resource. We also announced that the GovLab is now planning a series of Open Data Roundtables to bring together government agencies with the businesses that use their data – and that five federal agencies have agreed to participate. Video of the event, which was hosted by the Center for Data Innovation, is available here.
The Open Data 500, funded by the John S. and James L. Knight Foundation, is the first comprehensive study of U.S.-based companies that rely on open government data. Our website at OpenData500.com includes searchable, sortable information on 500 of these companies. Our data about them comes from responses to a survey we’ve sent to all the companies (190 have responded) and what we’ve been able to learn from research using public information. Anyone can now explore this website, read about specific companies or groups of companies, or download our data to analyze it. The website features an interactive tool on the home page, the Open Data Compass, that shows the connections between government agencies and different categories of companies visually.
We began work on the Open Data 500 study last fall with three goals. First, we wanted to collect information that will ultimately help calculate the economic value of open data – an important question for policymakers and others. Second, we wanted to present examples of open data companies to inspire others to use this important government resource in new ways. And third – and perhaps most important – we’ve hoped that our work will be a first step in creating a dialogue between the government agencies that provide open data and the companies that use it.
That dialogue is critically important to make government open data more accessible and useful. While open government data is a huge potential resource, and federal agencies are working to make it more available, it’s too often trapped in legacy systems that make the data difficult to find and to use. To solve this problem, we plan to connect agencies to their clients in the business community and help them work together to find and liberate the most valuable datasets.
We now plan to convene and facilitate a series of Open Data Roundtables – a new approach to bringing businesses and government agencies together. In these Roundtables, which will be informed by the Open Data 500 study, companies and the agencies that provide their data will come together in structured, results-oriented meetings that we will facilitate. We hope to help figure out what can be done to make the most valuable datasets more available and usable quickly.
We’ve been gratified by the immediate positive response to our plan from several federal agencies. The Department of Commerce has committed to help plan and participate in the first of our Roundtables, now being scheduled for May. By the time we announced our launch on April 8, the Departments of Labor, Transportation, and Treasury had also signed up. And at the end of the launch event, the Deputy Chief Information Officer of the USDA publicly committed her agency to participate as well…”
“Government Entrepreneur” is Not an Oxymoron
But imagine if the road that led to the Seattle City Council ridesharing hearings this month — with rulings that sharply curtail UberX, Lyft, and Sidecar’s operations there — had been a vastly different one. Imagine that public leaders had conceived and built a platform to provide this new, shared model of transit. Or at the very least, that instead of having a revolution of the current transit regime done to Seattle public leaders, it was done with them. Amidst the acrimony, it seems hard to imagine that public leaders could envision and operate such a platform, or that private innovators could work with them more collaboratively on it — but it’s not impossible. What would it take? Answer: more public entrepreneurs.
The idea of ”public entrepreneurship” may sound to you like it belongs on a list of oxymorons right alongside “government intelligence.” But it doesn’t. Public entrepreneurs around the world are improving our lives, inventing entirely new ways to serve the public. They are using sensors to detect potholes; word pedometers to help students learn; harnessing behavioral economics to encourage organ donation; crowdsourcing patent review; and transforming Medellin, Colombia with cable cars. They are coding in civic hackathons and competing in the Bloomberg challenge. They are partnering with an Office of New Urban Mechanics in Boston or in Philadelphia, co-developing products in San Francisco’s Entrepreneurship-in-Residence program, or deploying some of the more than $430 million invested into civic-tech in the last two years.
There is, however, a big problem with public entrepreneurs: there just aren’t enough of them. Without more public entrepreneurship, it’s hard to imagine meeting our public challenges or making the most of private innovation. One might argue that bungled healthcare website roll-outs or internet spying are evidence of too much activity on the part of public leaders, but I would argue that what they really show is too little entrepreneurial skill and judgment.
The solution to creating more public entrepreneurs is straightforward: train them. But, by and large, we don’t. Consider Howard Stevenson’s definition of entrepreneurship: “the pursuit of opportunity without regard to resources currently controlled.” We could teach that approach to people heading towards the public sector. But now consider the following list of terms: “acknowledgement of multiple constituencies,” “risk reduction,” “formal planning,” “coordination,” “efficiency measures,” “clearly defined responsibility,” and “organizational culture.” It reads like a list of the kinds of concepts we would want a new public official to know; like it might be drawn from an interview evaluation form or graduate school syllabus. In fact, it’s from Stevenson’s list of pressures that pull managers away from entrepreneurship and towards administration. Of course, that’s not all bad. We must have more great public administrators. But with all our challenges and amidst all the dynamism, we are going to need more than analysts and strategists in the public sector, we need inventors and builders, too.
Public entrepreneurship is not simply innovation in the public sector (though it makes use of innovation), and it’s not just policy reform (though it can help drive reform). Public entrepreneurs build something from nothing with resources — be they financial capital or human talent or new rules — they didn’t command. In Boston, I worked with many amazing public managers and a handful of outstanding public entrepreneurs. Chris Osgood and Nigel Jacob brought the country’s first major-city mobile 311 app to life, and they are public entrepreneurs. They created Citizens Connect in 2009 by bringing together iPhones on loan together with a local coder and the most under-tapped resource in the public sector: the public. They transformed the way basic neighborhood issues are reported and responded to (20% of all constituent cases in Boston are reported over smartphones now), and their model is now accessible to 40 towns in Massachusetts and cities across the country. The Mayor’s team in Boston that started-up the One Fund in the days after the Marathon bombings were public entrepreneurs. We built the organization from PayPal and a Post Office Box, and it went on to channel $61 million from donors to victims and survivors in just 75 days. It still operates today….
It’s worth noting that public entrepreneurship, perhaps newly buzzworthy, is not actually new. Elinor Ostrom (44 years before her Nobel Prize) observed public entrepreneurs inventing new models in the 1960s. Back when Ronald Reagan was president, Peter Drucker wrote that it was entrepreneurship that would keep public service “flexible and self-renewing.” And almost two decades have passed since David Osborne and Ted Gaebler’s “Reinventing Government” (the then handbook for public officials) carried the promising subtitle: “How the Entrepreneurial Spirit is Transforming the Public Sector”. Public entrepreneurship, though not nearly as widespread as its private complement, or perhaps as fashionable as its “social” counterpart (focussed on non-profits and their ecosystem), has been around for a while and so have those who practiced it.
But still today, we mostly train future public leaders to be public administrators. We school them in performance management and leave them too inclined to run from risk instead of managing it. And we communicate often, explicitly or not, to private entrepreneurs that government officials are failures and dinosaurs. It’s easy to see how that road led to Seattle this month, but hard see how it empowers public officials to take on the enormous challenges that still lie ahead of us, or how it enables the public to help them.”
Climate Data Initiative Launches with Strong Public and Private Sector Commitments
John Podesta and Dr. John P. Holdren at the White House blog: “…today, delivering on a commitment in the President’s Climate Action Plan, we are launching the Climate Data Initiative, an ambitious new effort bringing together extensive open government data and design competitions with commitments from the private and philanthropic sectors to develop data-driven planning and resilience tools for local communities. This effort will help give communities across America the information and tools they need to plan for current and future climate impacts.
The Climate Data Initiative builds on the success of the Obama Administration’s ongoing efforts to unleash the power of open government data. Since data.gov, the central site to find U.S. government data resources, launched in 2009, the Federal government has released troves of valuable data that were previously hard to access in areas such as health, energy, education, public safety, and global development. Today these data are being used by entrepreneurs, researchers, tech innovators, and others to create countless new applications, tools, services, and businesses.
Data from NOAA, NASA, the U.S. Geological Survey, the Department of Defense, and other Federal agencies will be featured on climate.data.gov, a new section within data.gov that opens for business today. The first batch of climate data being made available will focus on coastal flooding and sea level rise. NOAA and NASA will also be announcing an innovation challenge calling on researchers and developers to create data-driven simulations to help plan for the future and to educate the public about the vulnerability of their own communities to sea level rise and flood events.
These and other Federal efforts will be amplified by a number of ambitious private commitments. For example, Esri, the company that produces the ArcGIS software used by thousands of city and regional planning experts, will be partnering with 12 cities across the country to create free and open “maps and apps” to help state and local governments plan for climate change impacts. Google will donate one petabyte—that’s 1,000 terabytes—of cloud storage for climate data, as well as 50 million hours of high-performance computing with the Google Earth Engine platform. The company is challenging the global innovation community to build a high-resolution global terrain model to help communities build resilience to anticipated climate impacts in decades to come. And the World Bank will release a new field guide for the Open Data for Resilience Initiative, which is working in more than 20 countries to map millions of buildings and urban infrastructure….”
“Open-washing”: The difference between opening your data and simply making them available
They then use Google Maps as an example, which firstly isn’t entirely correct, as also pointed out by the Neogeografen, a geodata blogger, who explains how Google Maps isn’t offering raw data, but merely an image of the data. You are not allowed to download and manipulate the data – or run it off your own server.
But secondly I don’t think it’s very appropriate to highlight Google and their Maps project as a golden example of a business that lets its data flow unhindered to the public. It’s true that they are offering some data, but only in a very limited way – and definitely not as open data – and thereby not as progressively as the article suggests.
Surely it’s hard to accuse Google of not being progressive in general. The article states how Google Maps’ data are used by over 800,000 apps and businesses across the globe. So yes, Google has opened its silo a little bit, but only in a very controlled and limited way, which leaves these 800,000 businesses dependent on the continual flow of data from Google and thereby not allowing them to control the very commodities they’re basing their business on. This particular way of releasing data brings me to the problem that we’re facing: Knowing the difference between making data available and making them open.
Open data is characterized by not only being available, but being both legally open (released under an open license that allows full and free reuse conditioned at most to giving credit to it’s source and under same license) and technically available in bulk and in machine readable formats – contrary to the case of Google Maps. It may be that their data are available, but they’re not open. This – among other reasons – is why the global community around the 100% open alternative Open Street Map is growing rapidly and an increasing number of businesses choose to base their services on this open initiative instead.
But why is it important that data are open and not just available? Open data strengthens the society and builds a shared resource, where all users, citizens and businesses are enriched and empowered, not just the data collectors and publishers. “But why would businesses spend money on collecting data and then give them away?” you ask. Opening your data and making a profit are not mutually exclusive. Doing a quick Google search reveals many businesses that both offer open data and drives a business on them – and I believe these are the ones that should be highlighted as particularly progressive in articles such as the one from Computerworld….
We are seeing a rising trend of what can be termed “open-washing” (inspired by “greenwashing“) – meaning data publishers that are claiming their data is open, even when it’s not – but rather just available under limiting terms. If we – at this critical time in the formative period of the data driven society – aren’t critically aware of the difference, we’ll end up putting our vital data streams in siloed infrastructure built and owned by international corporations. But also to give our praise and support to the wrong kind of unsustainable technological development.”
This algorithm can predict a revolution
Russell Brandom at the Verge: “For students of international conflict, 2013 provided plenty to examine. There was civil war in Syria, ethnic violence in China, and riots to the point of revolution in Ukraine. For those working at Duke University’s Ward Lab, all specialists in predicting conflict, the year looks like a betting sheet, full of predictions that worked and others that didn’t pan out.
Guerrilla campaigns intensified, proving out the prediction
When the lab put out their semiannual predictions in July, they gave Paraguay a 97 percent chance of insurgency, largely based on reports of Marxist rebels. The next month, guerrilla campaigns intensified, proving out the prediction. In the case of China’s armed clashes between Uighurs and Hans, the models showed a 33 percent chance of violence, even as the cause of each individual flare-up was concealed by the country’s state-run media. On the other hand, the unrest in Ukraine didn’t start raising alarms until the action had already started, so the country was left off the report entirely.
According to Ward Lab’s staff, the purpose of the project isn’t to make predictions but to test theories. If a certain theory of geopolitics can predict an uprising in Ukraine, then maybe that theory is onto something. And even if these specialists could predict every conflict, it would only be half the battle. “It’s a success only if it doesn’t come at the cost of predicting a lot of incidents that don’t occur,” says Michael D. Ward, the lab’s founder and chief investigator, who also runs the blog Predictive Heuristics. “But it suggests that we might be on the right track.”
If a certain theory of geopolitics can predict an uprising in Ukraine, maybe that theory is onto something
Forecasting the future of a country wasn’t always done this way. Traditionally, predicting revolution or war has been a secretive project, for the simple reason that any reliable prediction would be too valuable to share. But as predictions lean more on data, they’ve actually become harder to keep secret, ushering in a new generation of open-source prediction models that butt against the siloed status quo.
Will this country’s government face an acute existential threat in the next six months?
The story of automated conflict prediction starts at the Defense Advance Research Projects Agency, known as the Pentagon’s R&D wing. In the 1990s, DARPA wanted to try out software-based approaches to anticipating which governments might collapse in the near future. The CIA was already on the case, with section chiefs from every region filing regular forecasts, but DARPA wanted to see if a computerized approach could do better. They looked at a simple question: will this country’s government face an acute existential threat in the next six months? When CIA analysts were put to the test, they averaged roughly 60 percent accuracy, so DARPA’s new system set the bar at 80 percent, looking at 29 different countries in Asia with populations over half a million. It was dubbed ICEWS, the Integrated Conflict Early Warning System, and it succeeded almost immediately, clearing 80 percent with algorithms built on simple regression analysis….
On the data side, researchers at Georgetown University are cataloging every significant political event of the past century into a single database called GDELT, and leaving the whole thing open for public research. Already, projects have used it to map the Syrian civil war and diplomatic gestures between Japan and South Korea, looking at dynamics that had never been mapped before. And then, of course, there’s Ward Lab, releasing a new sheet of predictions every six months and tweaking its algorithms with every development. It’s a mirror of the same open-vs.-closed debate in software — only now, instead of fighting over source code and security audits, it’s a fight over who can see the future the best.”
Developing an open government plan in the open
The development of an OGP National Action Plan, therefore, presents a twofold opportunity for opening up government: On the one hand it should be used to deliver a set of robust and ambitious commitments to greater transparency, participation and accountability. But just as importantly, the process of developing a NAP should also be used to model new forms of open and collaborative working within government and civil society. These two purposes of a NAP should be mutually reinforcing. An open and collaborative process can – as was the case in the UK – help to deliver a more robust and ambitious action plan, which in turn can demonstrate the efficacy of working in the open.
You could even go one step further to say that the development of an National Action Plan should present an (almost) “ideal” vision of what open government in a country could look like. If governments aren’t being open as they’re developing an open government action plan, then there’s arguably little hope that they’ll be open elsewhere.
As coordinators of the UK OGP civil society network, this was on our mind at the beginning and throughout the development of the UK’s 2013-15 National Action Plan. Crucially, it was also on the minds of our counterparts in the UK Government. From the start, therefore, the process was developed with the intention that it should itself model the principles of open government. Members of the UK OGP civil society network met with policy officials from the UK Government on a regular basis to scope out and develop the action plan, and we published regular updates of our discussions and progress for others to follow and engage with. The process wasn’t without its challenges – and there’s still much more we can do to open it up further in the future – but it was successful in moving far beyond the typical model of government deciding, announcing and defending its intentions and in delivering an action plan with some strong and ambitious commitments.
One of the benefits of working in an open and collaborative way is that it enabled us to conduct and publish a full – warts and all – review of what went well and what didn’t. So, consider this is an invitation to delve into our successes and failures, a challenge to do it better and a request to help us to do so too. Head over to the UK OGP civil society network blog to read about what we did, and tell us what you think: http://www.opengovernment.org.uk/national-action-plan/story-of-the-uk-national-action-plan-2013-15/”
11 ways to rethink open data and make it relevant to the public
Miguel Paz at IJNET: “It’s time to transform open data from a trendy concept among policy wonks and news nerds into something tangible to everyday life for citizens, businesses and grassroots organizations. Here are some ideas to help us get there:
1. Improve access to data
Craig Hammer from the World Bank has tackled this issue, stating that “Open Data could be the game changer when it comes to eradicating global poverty”, but only if governments make available online data that become actionable intelligence: a launch pad for investigation, analysis, triangulation, and improved decision making at all levels.
2. Create open data for the end user
As Hammer wrote in a blog post for the Harvard Business Review, while the “opening” has generated excitement from development experts, donors, several government champions, and the increasingly mighty geek community, the hard reality is that much of the public has been left behind, or tacked on as an afterthought. Let`s get out of the building and start working for the end user.
3. Show, don’t tell
Regular folks don’t know what “open data” means. Actually, they probably don’t care what we call it and don’t know if they need it. Apple’s Steve Jobs said that a lot of times, people don’t know what they want until you show it to them. We need to stop telling them they need it and start showing them why they need it, through actionable user experience.
4. Make it relevant to people’s daily lives, not just to NGOs and policymakers’ priorities
A study of the use of open data and transparency in Chile showed the top 10 uses were for things that affect their lives directly for better or for worse: data on government subsidies and support, legal certificates, information services, paperwork. If the data doesn’t speak to priorities at the household or individual level, we’ve lost the value of both the “opening” of data, and the data itself.
5. Invite the public into the sandbox
We need to give people “better tools to not only consume, but to create and manipulate data,” says my colleague Alvaro Graves, Poderopedia’s semantic web developer and researcher. This is what Code for America does, and it’s also what happened with the advent of Web 2.0, when the availability of better tools, such as blogging platforms, helped people create and share content.
6. Realize that open data are like QR codes
Everyone talks about open data the way they used to talk about QR codes–as something ground breaking. But as with QR Codes, open data only succeeds with the proper context to satisfy the needs of citizens. Context is the most important thing to funnel use and success of open data as a tool for global change.
7. Make open data sexy and pop, like Jess3.com
Geeks became popular because they made useful and cool things that could be embraced by end users. Open data geeks need to stick with that program.
8. Help journalists embrace open data
Jorge Lanata, a famous Argentinian journalist who is now being targeted by the Cristina Fernández administration due to his unfolding of government corruption scandals, once said that 50 percent of the success of a story or newspaper is assured if journalists like it.
That’s true of open data as well. If journalists understand its value for the public interest and learn how to use it, so will the public. And if they do, the winds of change will blow. Governments and the private sector will be forced to provide better, more up-to-date and standardized data. Open data will be understood not as a concept but as a public information source as relevant as any other. We need to teach Latin American journalists to be part of this.
9. News nerds can help you put your open data to good use
In order to boost the use of open data by journalists we need news nerds, teams of lightweight and tech-heavy armored journalist-programmers who can teach colleagues how open data through brings us high-impact storytelling that can change public policies and hold authorities accountable.
News nerds can also help us with “institutionalizing data literacy across societies” as Hammer puts it. ICFJ Knight International Journalism Fellow and digital strategist Justin Arenstein calls these folks “mass mobilizers” of information. Alex Howard “points to these groups because they can help demystify data, to make it understandable by populations and not just statisticians.”
I call them News Ninja Nerds, accelerator taskforces that can foster innovationsin news, data and transparency in a speedy way, saving governments and organizations time and a lot of money. Projects like ProPublica’s Dollars For Docs are great examples of what can be achieved if you mix FOIA, open data and the will to provide news in the public interest.
10. Rename open data
Part of the reasons people don’t embrace concepts such as open data is because it is part of a lingo that has nothing to do with them. No empathy involved. Let’s start talking about people’s right to know and use the data generated by governments. As Tim O’Reilly puts it: “Government as a Platform for Greatness,” with examples we can relate to, instead of dead .PDF’s and dirty databases.
11. Don’t expect open data to substitute for thinking or reporting
Investigative Reporting can benefit from it. But “but there is no substitute for the kind of street-level digging, personal interviews, and detective work” great journalism projects entailed, says David Kaplan in a great post entitled, Why Open Data is Not Enough.”
What makes a good API?
Joshua Tauberer’s Blog: “There comes a time in every dataset’s life when it wants to become an API. That might be because of consumer demand or an executive order. How are you going to make a good one?…
Let’s take the common case where you have a relatively static, large dataset that you want to provide read-only access to. Here are 19 common attributes of good APIs for this situation. …
Granular Access. If the user wanted the whole thing they’d download it in bulk, so an API must be good at providing access to the most granular level practical for data users (h/t Ben Balter for the wording on that). When the data comes from a table, this usually means the ability to read a small slice of it using filters, sorting, and paging (limit/offset), the ability to get a single row by identifying it with a persistent, unique identifier (usually a numeric ID), and the ability to select just which fields should be included in the result output (good for optimizing bandwidth in mobile apps, h/t Eric Mill). (But see “intents” below.)
Deep Filtering. An API should be good at needle-in-haystack problems. Full text search is hard to do, so an API that can do it relieves a big burden for developers — if your API has any big text fields. Filters that can span relations or cross tables (i.e. joins) can be very helpful as well. But don’t go overboard. (Again, see “intents” below.)
Typed Values. Response data should be typed. That means that whether a field’s value is an integer, text, list, floating-point number, dictionary, null, or date should be encoded as a part of the value itself. JSON and XML with XSD are good at this. CSV and plain XML, on the other hand, are totally untyped. Types must be strictly enforced. Columns must choose a data type and stick with it, no exceptions. When encoding other sorts of data as text, the values must all absolutely be valid according to the most narrow regular expression that you can make. Provide that regular expression to the API users in documentation.
Normalize Tables, Then Denormalize. Normalization is the process of removing redundancy from tables by making multiple tables. You should do that. Have lots of primary keys that link related tables together. But… then… denormalize. The bottleneck of most APIs isn’t disk space but speed. Queries over denormalized tables are much faster than writing queries with JOINs over multiple tables. It’s faster to get data if it’s all in one response than if the user has to issue multiple API calls (across multiple tables) to get it. You still have to normalize first, though. Denormalized data is hard to understand and hard to maintain.
Be RESTful, And More. ”REST” is a set of practices. There are whole books on this. Here it is in short. Every object named in the data (often that’s the rows of the table) gets its own URL. Hierarchical relationships in the data are turned into nice URL paths with slashes. Put the URLs of related resources in output too (HATEOAS, h/t Ed Summers). Use HTTP GET and normal query string processing (a=x&b=y) for filtering, sorting, and paging. The idea of REST is that these are patterns already familiar to developers, and reusing existing patterns — rather than making up entirely new ones — makes the API more understandable and reusable. Also, use HTTPS for everything (h/t Eric Mill), and provide the API’s status as an API itself possibly at the root URL of the API’s URL space (h/t Eric Mill again).
….
Never Require Registration. Don’t have authentication on your API to keep people out! In fact, having a requirement of registration may contradict other guidelines (such as the 8 Principles of Open Government Data). If you do use an API key, make it optional. A non-authenticated tier lets developers quickly test the waters, and that is really important for getting developers in the door, and, again, it may be important for policy reasons as well. You can have a carrot to incentivize voluntary authentication: raise the rate limit for authenticated queries, for instance. (h/t Ben Balter)
Interactive Documentation. An API explorer is a web page that users can visit to learn how to build API queries and see results for test queries in real time. It’s an interactive browser tool, like interactive documentation. Relatedly, an “explain mode” in queries, which instead of returning results says what the query was and how it would be processed, can help developers understand how to use the API (h/t Eric Mill).
Developer Community. Life is hard. Coding is hard. The subject matter your data is about is probably very complex. Don’t make your API users wade into your API alone. Bring the users together, bring them to you, and sometimes go to them. Let them ask questions and report issues in a public place (such as github). You may find that users will answer other users’ questions. Wouldn’t that be great? Have a mailing list for longer questions and discussion about the future of the API. Gather case studies of how people are using the API and show them off to the other users. It’s not a requirement that the API owner participates heavily in the developer community — just having a hub is very helpful — but of course the more participation the better.
Create Virtuous Cycles. Create an environment around the API that make the data and API stronger. For instance, other individuals within your organization who need the data should go through the public API to the greatest extent possible. Those users are experts and will help you make a better API, once they realize they benefit from it too. Create a feedback loop around the data, meaning find a way for API users to submit reports of data errors and have a process to carry out data updates, if applicable and possible. Do this in the public as much as possible so that others see they can also join the virtuous cycle.”