Stefaan Verhulst
Joshua Tauberer’s Blog: “There comes a time in every dataset’s life when it wants to become an API. That might be because of consumer demand or an executive order. How are you going to make a good one?…
Let’s take the common case where you have a relatively static, large dataset that you want to provide read-only access to. Here are 19 common attributes of good APIs for this situation. …
Granular Access. If the user wanted the whole thing they’d download it in bulk, so an API must be good at providing access to the most granular level practical for data users (h/t Ben Balter for the wording on that). When the data comes from a table, this usually means the ability to read a small slice of it using filters, sorting, and paging (limit/offset), the ability to get a single row by identifying it with a persistent, unique identifier (usually a numeric ID), and the ability to select just which fields should be included in the result output (good for optimizing bandwidth in mobile apps, h/t Eric Mill). (But see “intents” below.)
Deep Filtering. An API should be good at needle-in-haystack problems. Full text search is hard to do, so an API that can do it relieves a big burden for developers — if your API has any big text fields. Filters that can span relations or cross tables (i.e. joins) can be very helpful as well. But don’t go overboard. (Again, see “intents” below.)
Typed Values. Response data should be typed. That means that whether a field’s value is an integer, text, list, floating-point number, dictionary, null, or date should be encoded as a part of the value itself. JSON and XML with XSD are good at this. CSV and plain XML, on the other hand, are totally untyped. Types must be strictly enforced. Columns must choose a data type and stick with it, no exceptions. When encoding other sorts of data as text, the values must all absolutely be valid according to the most narrow regular expression that you can make. Provide that regular expression to the API users in documentation.
Normalize Tables, Then Denormalize. Normalization is the process of removing redundancy from tables by making multiple tables. You should do that. Have lots of primary keys that link related tables together. But… then… denormalize. The bottleneck of most APIs isn’t disk space but speed. Queries over denormalized tables are much faster than writing queries with JOINs over multiple tables. It’s faster to get data if it’s all in one response than if the user has to issue multiple API calls (across multiple tables) to get it. You still have to normalize first, though. Denormalized data is hard to understand and hard to maintain.
Be RESTful, And More. ”REST” is a set of practices. There are whole books on this. Here it is in short. Every object named in the data (often that’s the rows of the table) gets its own URL. Hierarchical relationships in the data are turned into nice URL paths with slashes. Put the URLs of related resources in output too (HATEOAS, h/t Ed Summers). Use HTTP GET and normal query string processing (a=x&b=y) for filtering, sorting, and paging. The idea of REST is that these are patterns already familiar to developers, and reusing existing patterns — rather than making up entirely new ones — makes the API more understandable and reusable. Also, use HTTPS for everything (h/t Eric Mill), and provide the API’s status as an API itself possibly at the root URL of the API’s URL space (h/t Eric Mill again).
….
Never Require Registration. Don’t have authentication on your API to keep people out! In fact, having a requirement of registration may contradict other guidelines (such as the 8 Principles of Open Government Data). If you do use an API key, make it optional. A non-authenticated tier lets developers quickly test the waters, and that is really important for getting developers in the door, and, again, it may be important for policy reasons as well. You can have a carrot to incentivize voluntary authentication: raise the rate limit for authenticated queries, for instance. (h/t Ben Balter)
Interactive Documentation. An API explorer is a web page that users can visit to learn how to build API queries and see results for test queries in real time. It’s an interactive browser tool, like interactive documentation. Relatedly, an “explain mode” in queries, which instead of returning results says what the query was and how it would be processed, can help developers understand how to use the API (h/t Eric Mill).
Developer Community. Life is hard. Coding is hard. The subject matter your data is about is probably very complex. Don’t make your API users wade into your API alone. Bring the users together, bring them to you, and sometimes go to them. Let them ask questions and report issues in a public place (such as github). You may find that users will answer other users’ questions. Wouldn’t that be great? Have a mailing list for longer questions and discussion about the future of the API. Gather case studies of how people are using the API and show them off to the other users. It’s not a requirement that the API owner participates heavily in the developer community — just having a hub is very helpful — but of course the more participation the better.
Create Virtuous Cycles. Create an environment around the API that make the data and API stronger. For instance, other individuals within your organization who need the data should go through the public API to the greatest extent possible. Those users are experts and will help you make a better API, once they realize they benefit from it too. Create a feedback loop around the data, meaning find a way for API users to submit reports of data errors and have a process to carry out data updates, if applicable and possible. Do this in the public as much as possible so that others see they can also join the virtuous cycle.”
If, in the words of Google chairman Eric Schmidt, there is a “race between people and computers” even he suspects people may not win, democrats everywhere should be worried. In the same vein, Lawrence Summers, former Treasury secretary, recently noted that new technology could be liberating but that the government needed to soften its negative effects and make sure the benefits were distributed fairly. The problem, he went on, was that “we don’t yet have the Gladstone, the Teddy Roosevelt or the Bismarck of the technology era”.
These Victorian giants have much to teach us. They were at the helm when their societies were transformed by the telegraph, the electric light, the telephone and the combustion engine. Each tried to soften the blow of change, and to equalise the benefits of prosperity for working people. With William Gladstone it was universal primary education and the vote for Britain’s working men. With Otto von Bismarck it was legislation that insured German workers against ill-health and old age. For Roosevelt it was the entire progressive agenda, from antitrust legislation and regulation of freight rates to the conservation of America’s public lands….
The Victorians created the modern state to tame the market in the name of democracy but they wanted a nightwatchman state, not a Leviathan. Thanks to the new digital technologies, the state they helped create now has powers of surveillance that threaten our privacy and freedom. What new technology makes possible, states will do. Keeping technology in the service of democracy will not be easy. Asking judges to guard the guards only bloats the state apparatus still further. Allowing dissident insiders to get away with leaking the state’s secrets will only result in more secretive, paranoid and controlling government.
The Victorians would have said there is a solution – representative government itself – but it requires citizens to trust their representatives to hold the government in check. The Victorians created modern, mass representative democracy so that collective public choice could control change for everyone’s benefit. They believed that representatives, if given the authority and the necessary information, could control the power that technology confers on the modern state.
This is still a viable ideal but we have plenty of rebuilding before our democratic institutions are ready for the task. Congress and parliament need to regain trust and capability; and, if they do, we can start recovering the faith of the Victorians we so sorely need: the belief that democracy can master the technologies that are transforming our lives.“
Wired: “Twenty-five years on from the web’s inception, its creator has urged the public to re-engage with its original design: a decentralised internet that at its very core, remains open to all.
Speaking with Wired editor David Rowan at an event launching the magazine’s March issue, Tim Berners-Lee said that although part of this is about keeping an eye on for-profit internet monopolies such as search engines and social networks, the greatest danger is the emergence of a balkanised web.
“I want a web that’s open, works internationally, works as well as possible and is not nation-based,” Berners-Lee told the audience… “What I don’t want is a web where the Brazilian government has every social network’s data stored on servers on Brazilian soil. That would make it so difficult to set one up.”
It’s the role of governments, startups and journalists to keep that conversation at the fore, he added, because the pace of change is not slowing — it’s going faster than ever before. For his part Berners-Lee drives the issue through his work at the Open Data Institute, World Wide Web Consortium and World Wide Web Foundation, but also as an MIT professor whose students are “building new architectures for the web where it’s decentralised”. On the issue of monopolies, Berners-Lee did say it’s concerning to be “reliant on big companies, and one big server”, something that stalls innovation, but that competition has historically resolved these issues and will continue to do so.
The kind of balkanised web he spoke about, as typified by Brazil’s home-soil servers argument or Iran’s emerging intranet, is partially being driven by revelations of NSA and GCHQ mass surveillance. The distrust that it has brewed, from a political level right down to the threat of self-censorship among ordinary citizens, threatens an open web and is, said Berners-Lee, a greater threat than censorship. Knowing the NSA may be breaking commercial encryption services could result in the emergence of more networks like China’s Great Firewall, to “protect” citizens. This is why we need a bit of anti-establishment push back, alluded to by Berners-Lee.”
The Economist on Government-to-government trade: “NIGERIAN pineapple for breakfast, Peruvian quinoa for lunch and Japanese sushi for dinner. Two centuries ago, when David Ricardo advocated specialisation and free trade, the notion that international exchange in goods and services could make such a cosmopolitan diet commonplace would have seemed fanciful.
Today another scenario may appear equally unlikely: a Norwegian government agency managing Algeria’s sovereign-wealth fund; German police overseeing security in the streets of Mumbai; and Dubai playing the role of the courthouse of the Middle East. Yet such outlandish possibilities are more than likely if a new development fulfils its promise. Ever more governments are trading with each other, from advising lawmakers to managing entire services. They are following businesses, which have long outsourced much of what they do. Is this the dawn of the government-to-government era?
Such “G2G” trade is not new, though the name may be. After the Ottoman empire defaulted on its debt in 1875 foreign lenders set up an “Ottoman Public Debt Administration”, its governing council packed with European government officials. At its peak it had 9,000 employees, more than the empire’s finance ministry. And the legacy of enforced G2G trade—colonialism, as it was known—is still visible even today. Britain’s Privy Council is the highest court of appeal for many Commonwealth countries. France provides a monetary-policy service to several west African nations by managing their currency, the CFA franc.
One reason G2G trade is growing is that it is a natural extension of the trend for governments to pinch policies from each other. “Policymaking now routinely occurs in comparative terms,” says Jamie Peck of the University of British Columbia, who refers to G2G advice as “fast policy”. Since the late 1990s Mexico’s pioneering policy to make cash benefits for poor families conditional on things like getting children vaccinated and sending them to school has been copied by almost 50 other countries….Budget cuts can provide another impetus for G2G trade. The Dutch army recently sold its Leopard II tanks and now sends tank crews to train with German forces. That way it will be able to reform its tank squadrons quickly if they are needed. Britain, with a ten-year gap between scrapping old aircraft-carriers and buying new ones, has sent pilots to train with the American marines on the F-35B, which will fly from both American and British carriers.
…
No one knows the size of the G2G market. Governments rarely publicise deals, not least because they fear looking weak. And there are formidable barriers to trade. The biggest is the “Westphalian” view of sovereignty, says Stephen Krasner of Stanford University: that states should run their own affairs without foreign interference. In 2004 Papua New Guinea’s parliament passed a RAMSI-like delegation agreement, but local elites opposed it and courts eventually declared it unconstitutional. Honduras attempted to create independent “charter cities”, a concept developed by Paul Romer of New York University (NYU), whose citizens would have had the right of appeal to the supreme court of Mauritius. But in 2012 this scheme, too, was deemed unconstitutional.
Critics fret about accountability and democratic legitimacy. The 2005 Paris Declaration on Aid Effectiveness, endorsed by governments and aid agencies, made much of the need for developing countries to design their own development strategies. And providers open themselves to reputational risk. British police, for instance, have trained Bahraini ones. A heavy-handed crackdown by local forces during the Arab spring reflected badly on their foreign teachers…
When San Francisco decided to install wireless control systems for its streetlights, it posted a “call for solutions” on Citymart, an online marketplace for municipal projects. In 2012 it found a Swiss firm, Paradox Engineering, which had built such systems for local cities. But though members often share ideas, says Sascha Haselmayer, Citymart’s founder, most still decide to implement their chosen policies themselves.
Weak government services are the main reason poor countries fail to catch up with rich ones, says Mr Romer. One response is for people in poorly run places to move to well governed ones. Better would be to bring efficient government services to them. In a recent paper with Brandon Fuller, also of NYU, Mr Romer argues that either response would bring more benefits than further lowering the barriers to trade in privately provided goods and services. Firms have long outsourced activities, even core ones, to others that do them better. It is time governments followed suit.”
Harvard Business School Paper by Kevin J. Boudreau, Patrick Gaule, Karim R. Lakhani, Christoph Riedl, and Anita Williams Woolley: “Online “organizations” are becoming a major engine for knowledge development in a variety of domains such as Wikipedia and open source software development. Many online platforms involve collaboration and coordination among members to reach common goals. In this sense, they are collaborative communities. This paper asks: What factors most inspire online teams to begin to collaborate and to do so creatively and effectively? The authors analyze a data set of 260 individuals randomly assigned to 52 teams tasked with developing working solutions to a complex innovation problem over 10 days, with varying cash incentives. Findings showed that although cash incentives stimulated a significant boost of effort per se, cash incentives did not transform the nature of the work process or affect the level of collaboration. In addition, at a basic yet striking level, the likelihood that an individual chooses to participate depended on whether teammates were themselves active. Moreover, communications among teammates led to more communications, and communications among teammates also stimulated greater continuous levels of effort. Overall, the study sheds light on how perspectives on incentives, predominant in economics, and perspectives on social processes and interactions, predominant in research on organizational behavior and teams, can be better understood. Key concepts include:
- An individual’s likelihood of being active in online collaboration increases by about 41 percent with each additional active teammate.
- Management could provide communications channels to make the efforts of other members more visible. This is important in the design of systems for online work as it helps members to confirm that others are actively contributing.
Press Release: “AskThem.io, launching Feb. 10th, is a free & open-source website for questions-and-answers with public figures. AskThem is like a version of the White House’s “We The People” petition platform, where over 8 million people have taken action to support questions for a public response – but for the first time, for every elected official nationwide…AskThem.io has official government data for over 142,000 U.S. elected officials at every level of government: federal, state, county, and municipal. Also, AskThem allows anyone to ask a question to any verified Twitter account, for online dialogue with public figures.
Here’s how AskThem works for online public dialogue:
- For the first time in an open-source website, visitors enter their street address to see all their elected officials, from federal down to the city levels, or search for a verified Twitter account.
- Individuals & organizations submit a question to their elected officials – for example, asking a city council member about a proposed ban on plastic bags.
- People then sign on to the questions and petitions they support, voting them up on AskThem and sharing them over social media, as with online petitions.
- When a question passes a certain threshold of signatures, AskThem delivers it to the recipient over email & social media and encourages a public response – creating a continual, structured dialogue with elected officials at every level of government.
AskThem also incorporates open government data, such as city council agendas and key vote information, to inform good questions of people in power. Open government advocate, Chicago, IL Clerk Susana Mendoza, joined AskThem because she believes that “technology should bring residents and the Office of the Chicago City Clerk closer together.”
Elected officials who sign up with AskThem agree to respond to the most popular questions from their constituents (about two per month). Interested elected officials can sign up now to become verified, free & open to everyone.
Issue-based organizations can use question & petition info from AskThem to surface political issues in their area that people care about, stay continuously engaged with government, and promote public accountability. Participating groups on AskThem include the internet freedom non-profit Fight For the Future, the social media crowd-speaking platform Thunderclap.it, the Roosevelt Institute National Student Network, and more.”
Press Release: “Public website aims to encourage communities interested in DARPA research to build off the agency’s work, starting with big data…
DARPA has invested in many programs that sponsor fundamental and applied research in areas of computer science, which have led to new advances in theory as well as practical software. The R&D community has asked about the availability of results, and now DARPA has responded by creating the DARPA Open Catalog, a place for organizing and sharing those results in the form of software, publications, data and experimental details. The Catalog can be found at http://go.usa.gov/BDhY.
Many DoD and government research efforts and software procurements contain publicly releasable elements, including open source software. The nature of open source software lends itself to collaboration where communities of developers augment initial products, build on each other’s expertise, enable transparency for performance evaluation, and identify software vulnerabilities. DARPA has an open source strategy for areas of work including big data to help increase the impact of government investments in building a flexible technology base.
“Making our open source catalog available increases the number of experts who can help quickly develop relevant software for the government,” said Chris White, DARPA program manager. “Our hope is that the computer science community will test and evaluate elements of our software and afterward adopt them as either standalone offerings or as components of their products.”
Melissa Jun Rowley at the Toolbox: “Though democratic governments are of the people, by the people, and for the people, it often seems that our only input is electing officials who pass laws on our behalf. After all, I don’t know many people who attend town hall meetings these days. But the evolution of technology has given citizens a new way to participate. Governments are using technology to include as many voices from their communities as possible in civic decisions and activities. Here are three examples.
Raleigh, NC
Raleigh North Carolina’s open government initiative is a great example of passive citizen engagement. By following an open source strategy, Open Raleigh has made city data available to the public. Citizens then use the data in a myriad of ways, from simply visualizing daily crime in their city, to creating an app that lets users navigate and interactively utilize the city’s greenway system.
Fort Smith, AR
Using MindMixer, Fort Smith Arkansas has created an online forum for residents to discuss the city’s comprehensive plan, effectively putting the community’s future in the hands of the community itself. Citizens are invited to share their own ideas, vote on ideas submitted by others, and engage with city officials that are “listening” to the conversation on the site.
Seattle, WA
Being a tech town, it’s no surprise that Seattle is using social media as a citizen engagement tool. The Seattle Police Department (SPD) uses a variety of social media tools to reach the public. In 2012, the department launched a first-of-its kind hyper-local twitter initiative. A police scanner for the twitter generation, Tweets by Beat provides twitter feeds of police dispatches in each of Seattle’s 51 police beats so that residents can find out what is happening right on their block.
In addition to Twitter and Facebook, SPD created a Tumblr to, in their own words, “show you your police department doing police-y things in your city.” In a nutshell, the department’s Tumblr serves as an extension of their other social media outlets. “
Having poor quality news coverage is especially problematic when the political process is sharply polarized. As has been documented by political scientists Tom Mann and Norman Ornstein, the United States has a Congress today where the most conservative Democrat is to the left of the most moderate Republican. [1] There are many reasons for this spike in polarization, but there is little doubt that the news media amplify and exacerbate social and political divisions.
Too often, journalists follow a “Noah’s Ark” approach to coverage in which a strong liberal is paired with a vocal conservative in an ideological food fight. The result is polarization of discourse and “false equivalence” in reporting. This lack of nuanced analysis confuses viewers and makes it difficult for them to sort out the contrasting facts and opinions. People get the sense that there are only two policy options and that there are few gradations or complexities in the positions that are reported.
In this paper, West and Stone review challenges facing the news media in an age of political polarization. This includes hyper-competitiveness in news coverage, a dramatic decline in local journalism and resulting nationalization of the news, and the personalization of coverage. After discussing these problems and how they harm current reporting, they present several ideas for nudging news producers and consumers towards more thoughtful and less polarizing responses.”
Emerging Technology From the arXiv: “Nobody agrees on how to define a city. But the emergence of “natural cities” from social media data sets may change that, say computational geographers…
A city is a large, permanent human settlement. But try and define it more carefully and you’ll soon run into trouble. A settlement that qualifies as a city in Sweden may not qualify in China, for example. And the reasons why one settlement is classified as a town while another as a city can sometimes seem almost arbitrary.
City planners know this problem well. They tend to define cities by administrative, legal or even historical boundaries that have little logic to them. Indeed, the same city can sometimes be defined in various different ways.
That causes all kinds of problems from counting the total population to working out who pays for the upkeep of the place. Which definition do you use?
Now help may be at hand thanks to the work of Bin Jiang and Yufan Miao at the University of Gävle in Sweden. These guys have found a way to use people’s location recorded by social media to define the boundaries of so-called natural cities which have a close resemblance to real cities in the US.
Jiang and Miao began with a dataset from the Brightkite social network, which was active between 2008 and 2010. The site encouraged users to log in with their location details so that they could see other users nearby. So the dataset consists of almost 3 million locations in the US and the dates on which they were logged.
To start off, Jiang and Miao simply placed a dot on a map at the location of each login. They then connected these dots to their neighbours to form triangles that end up covering the entire mainland US.
Next, they calculated the size of each triangle on the map and plotted this size distribution, which turns out to follow a power law. So there are lots of tiny triangles but only a few large ones.
Finally, the calculated the average size of the triangles and then coloured in all those that were smaller than average. The coloured areas are “natural cities”, say Jiang and Miao.
It’s easy to imagine that resulting map of triangles is of little value. But to the evident surprise of ther esearchers, it produces a pretty good approximation of the cities in the US. “We know little about why the procedure works so well but the resulting patterns suggest that the natural cities effectively capture the evolution of real cities,” they say.
That’s handy because it suddenly gives city planners a way to study and compare cities on a level playing field. It allows them to see how cities evolve and change over time too. And it gives them a way to analyse how cities in different parts of the world differ.
Of course, Jiang and Miao will want to find out why this approach reveals city structures in this way. That’s still something of a puzzle but the answer itself may provide an important insight into the nature of cities (or at least into the nature of this dataset).
A few days ago, this blog wrote about how a new science of cities is emerging from the analysis of big data. This is another example and expect to see more.
Ref: http://arxiv.org/abs/1401.6756 : The Evolution of Natural Cities from the Perspective of Location-Based Social Media”