What’s Wrong with Open-Data Sites–and How We Can Fix Them


César A. Hidalgo at Scientific American: “Imagine shopping in a supermarket where every item is stored in boxes that look exactly the same. Some are filled with cereal, others with apples, and others with shampoo. Shopping would be an absolute nightmare! The design of most open data sites—the (usually government) sites that distribute census, economic and other data to be used and redistributed freely—is not exactly equivalent to this nightmarish supermarket. But it’s pretty close.

During the last decade, such sites—data.gov, data.gov.uk, data.gob.cl,data.gouv.fr, and many others—have been created throughout the world. Most of them, however, still deliver data as sets of links to tables, or links to other sites that are also hard to comprehend. In the best cases, data is delivered through APIs, or application program interfaces, which are simple data query languages that require a user to have a basic knowledge of programming. So understanding what is inside each dataset requires downloading, opening, and exploring the set in ways that are extremely taxing for users. The analogy of the nightmarish supermarket is not that far off.

THE U.S. GOVERNMENT’S OPEN DATA SITE

The consensus among those who have participated in the creation of open data sites is that current efforts have failed and we need new options. Pointing your browser to these sites should show you why. Most open data sites are badly designed, and here I am not talking about their aesthetics—which are also subpar—but about the conceptual model used to organize and deliver data to users. The design of most open data sites follows a throwing-spaghetti-against-the-wall strategy, where opening more data, instead of opening data better, has been the driving force.

Some of the design flaws of current open data sites are pretty obvious. The datasets that are more important, or could potentially be more useful, are not brought into the surface of these sites or are properly organized. In our supermarket analogy, not only all boxes look the same, but also they are sorted in the order they came. This cannot be the best we can do.

There are other design problems that are important, even though they are less obvious. The first one is that most sites deliver data in the way in which it is collected, instead of used. People are often looking for data about a particular place, occupation, industry, or about an indicator (such as income, or population). If the data they need comes from the national survey of X, or the bureau of Y, it is secondary and often—although not always—irrelevant to the user. Yet, even though this is not the way we should be giving data back to users, this is often what open data sites do.

The second non-obvious design problem, which is probably the most important, is that most open data sites bury data in what is known as the deep web. The deep web is the fraction of the Internet that is not accessible to search engines, or that cannot be indexed properly. The surface of the web is made of text, pictures, and video, which search engines know how to index. But search engines are not good at knowing that the number that you are searching for is hidden in row 17,354 of a comma separated file that is inside a zip file linked in a poorly described page of an open data site. In some cases, pressing a radio button and selecting options from a number of dropdown menus can get you the desired number, but this does not help search engines either, because crawlers cannot explore dropdown menus. To make open data really open, we need to make it searchable, and for that we need to bring data to the surface of the web.

So how do we that? The solution may not be simple, but it starts by taking design seriously. This is something that I’ve been doing for more than half a decade when creating data visualization engines at MIT. The latest iteration of our design principles are now embodied in DataUSA, a site we created in a collaboration between Deloitte, Datawheel, and my group at MIT.

So what is design, and how do we use it to improve open data sites? My definition of design is simple. Design is discovering the forms that best fulfill a function….(More)”

Crowdsourced Deliberation: The Case of the Law on Off-Road Traffic in Finland


Tanja Aitamurto and Hélène Landemore in Policy & Internet: “This article examines the emergence of democratic deliberation in a crowdsourced law reform process. The empirical context of the study is a crowdsourced legislative reform in Finland, initiated by the Finnish government. The findings suggest that online exchanges in the crowdsourced process qualify as democratic deliberation according to the classical definition. We introduce the term “crowdsourced deliberation” to mean an open, asynchronous, depersonalized, and distributed kind of online deliberation occurring among self-selected participants in the context of an attempt by government or another organization to open up the policymaking or lawmaking process. The article helps to characterize the nature of crowdsourced policymaking and to understand its possibilities as a practice for implementing open government principles. We aim to make a contribution to the literature on crowdsourcing in policymaking, participatory and deliberative democracy and, specifically, the newly emerging subfield in deliberative democracy that focuses on “deliberative systems.”…(More)”

Citizen scientists aid Ecuador earthquake relief


Mark Zastrow at Nature: “After a magnitude-7.8 earthquake struck Ecuador’s Pacific coast on 16 April, a new ally joined the international relief effort: a citizen-science network called Zooniverse.

On 25 April, Zooniverse launched a website that asks volunteers to analyse rapidly-snapped satellite imagery of the disaster, which led to more than 650 reported deaths and 16,000 injuries. The aim is to help relief workers on the ground to find the most heavily damaged regions and identify which roads are passable.

Several crisis-mapping programmes with thousands of volunteers already exist — but it can take days to train satellites on the damaged region and to transmit data to humanitarian organizations, and results have not always proven useful. The Ecuador quake marked the first live public test for an effort dubbed the Planetary Response Network (PRN), which promises to be both more nimble than previous efforts, and to use more rigorous machine-learning algorithms to evaluate the quality of crowd-sourced analyses.

The network relies on imagery from the satellite company Planet Labs in San Francisco, California, which uses an array of shoebox-sized satellites to map the planet. In order to speed up the crowd-sourced process, it uses the Zooniverse platform to distribute the tasks of spotting features in satellite images. Machine-learning algorithms employed by a team at the University of Oxford, UK, then classify the reliability of each volunteer’s analysis and weight their contributions accordingly.

Rapid-fire data

Within 2 hours of the Ecuador test project going live with a first set of 1,300 images, each photo had been checked at least 20 times. “It was one of the fastest responses I’ve seen,” says Brooke Simmons, an astronomer at the University of California, San Diego, who leads the image processing. Steven Reece, who heads the Oxford team’s machine-learning effort, says that results — a “heat map” of damage with possible road blockages — were ready in another two hours.

In all, more than 2,800 Zooniverse users contributed to analysing roughly 25,000 square kilometres of imagery centred around the coastal cities of Pedernales and Bahia de Caraquez. That is where the London-based relief organization Rescue Global — which requested the analysis the day after the earthquake — currently has relief teams on the ground, including search dogs and medical units….(More)”

Opening up census data for research


Economic and Social Research Council (UK): “InFuse, an online search facility for census data, is enabling tailored search and investigation of UK census statistics – opening new opportunities for aggregating and comparing population counts.

Impacts

  • InFuse data were used for the ‘Smarter Travel’ research project studying how ‘smart choices’ for sustainable travel could be implemented and supported in transport planning. The research directly influenced UK climate-change agendas and policy, including:
    • the UK Committee on Climate Change recommendations on cost-effective-emission reductions
    • the Scottish Government’s targets and household advice for smarter travel
    • the UK Government’s Local Sustainable Transport Fund supporting 96 projects across England
    • evaluations for numerous Local Authority Transport Plans across the UK.
  • The Integration Hub, a web resource that was launched by Demos in 2015 to provide data about ethnic integration in England and Wales, uses data from InFuse to populate its interactive maps of the UK.
  • Census data downloaded from InFuse informed the Welsh Government for policies to engage Gypsy and Traveller families in education, showing that over 60 per cent aged over 16 from these communities had no qualifications.
  • Executive recruitment firm Sapphire Partners used census data from InFuse in a report on female representation on boards, revealing that 77 per cent of FTSE board members are men, and 70 per cent of new board appointments go to men.
  • A study by the Marie Curie charity into the differing needs of Black, Asian and minority ethnic groups in Scotland for end-of-life care used InFuse to determine that the minority ethnic population in Scotland has doubled since 2001 from 100,000 to 200,000 – highlighting the need for greater and more appropriate provision.
  • A Knowledge Transfer Partnership between homelessness charity Llamau and Cardiff University used InFuse data to show that Welsh young homeless people participating in the study were over twice as likely to have left school with no qualifications compared to UK-wide figures for their age group and gender….(More)”

 

Open Data Supply: Enriching the usability of information


Report by Phoensight: “With the emergence of increasing computational power, high cloud storage capacity and big data comes an eager anticipation of one of the biggest IT transformations of our society today.

Open data has an instrumental role to play in our digital revolution by creating unprecedented opportunities for governments and businesses to leverage off previously unavailable information to strengthen their analytics and decision making for new client experiences. Whilst virtually every business recognises the value of data and the importance of the analytics built on it, the ability to realise the potential for maximising revenue and cost savings is not straightforward. The discovery of valuable insights often involves the acquisition of new data and an understanding of it. As we move towards an increasing supply of open data, technological and other entrepreneurs will look to better utilise government information for improved productivity.

This report uses a data-centric approach to examine the usability of information by considering ways in which open data could better facilitate data-driven innovations and further boost our economy. It assesses the state of open data today and suggests ways in which data providers could supply open data to optimise its use. A number of useful measures of information usability such as accessibility, quantity, quality and openness are presented which together contribute to the Open Data Usability Index (ODUI). For the first time, a comprehensive assessment of open data usability has been developed and is expected to be a critical step in taking the open data agenda to the next level.

With over two million government datasets assessed against the open data usability framework and models developed to link entire country’s datasets to key industry sectors, never before has such an extensive analysis been undertaken. Government open data across Australia, Canada, Singapore, the United Kingdom and the United States reveal that most countries have the capacity for improvements in their information usability. It was found that for 2015 the United Kingdom led the way followed by Canada, Singapore, the United States and Australia. The global potential of government open data is expected to reach 20 exabytes by 2020, provided governments are able to release as much data as possible within legislative constraints….(More)”

The Open Data Barometer (3rd edition)


The Open Data Barometer: “Once the preserve of academics and statisticians, data has become a development cause embraced by everyone from grassroots activists to the UN Secretary-General. There’s now a clear understanding that we need robust data to drive democracy and development — and a lot of it.

Last year, the world agreed the Sustainable Development Goals (SDGs) — seventeen global commitments that set an ambitious agenda to end poverty, fight inequality and tackle climate change by 2030. Recognising that good data is essential to the success of the SDGs, the Global Partnership for Sustainable Development Data and the International Open Data Charter were launched as the SDGs were unveiled. These alliances mean the “data revolution” now has over 100 champions willing to fight for it. Meanwhile, Africa adopted the African Data Consensus — a roadmap to improving data standards and availability in a region that has notoriously struggled to capture even basic information such as birth registration.

But while much has been made of the need for bigger and better data to power the SDGs, this year’s Barometer follows the lead set by the International Open Data Charter by focusing on how much of this data will be openly available to the public.

Open data is essential to building accountable and effective institutions, and to ensuring public access to information — both goals of SDG 16. It is also essential for meaningful monitoring of progress on all 169 SDG targets. Yet the promise and possibilities offered by opening up data to journalists, human rights defenders, parliamentarians, and citizens at large go far beyond even these….

At a glance, here are this year’s key findings on the state of open data around the world:

    • Open data is entering the mainstream.The majority of the countries in the survey (55%) now have an open data initiative in place and a national data catalogue providing access to datasets available for re-use. Moreover, new open data initiatives are getting underway or are promised for the near future in a number of countries, including Ecuador, Jamaica, St. Lucia, Nepal, Thailand, Botswana, Ethiopia, Nigeria, Rwanda and Uganda. Demand is high: civil society and the tech community are using government data in 93% of countries surveyed, even in countries where that data is not yet fully open.
    • Despite this, there’s been little to no progress on the number of truly open datasets around the world.Even with the rapid spread of open government data plans and policies, too much critical data remains locked in government filing cabinets. For example, only two countries publish acceptable detailed open public spending data. Of all 1,380 government datasets surveyed, almost 90% are still closed — roughly the same as in the last edition of the Open Data Barometer (when only 130 out of 1,290 datasets, or 10%, were open). What is more, much of the approximately 10% of data that meets the open definition is of poor quality, making it difficult for potential data users to access, process and work with it effectively.
    • “Open-washing” is jeopardising progress. Many governments have advertised their open data policies as a way to burnish their democratic and transparent credentials. But open data, while extremely important, is just one component of a responsive and accountable government. Open data initiatives cannot be effective if not supported by a culture of openness where citizens are encouraged to ask questions and engage, and supported by a legal framework. Disturbingly, in this edition we saw a backslide on freedom of information, transparency, accountability, and privacy indicators in some countries. Until all these factors are in place, open data cannot be a true SDG accelerator.
    • Implementation and resourcing are the weakest links.Progress on the Barometer’s implementation and impact indicators has stalled or even gone into reverse in some cases. Open data can result in net savings for the public purse, but getting individual ministries to allocate the budget and staff needed to publish their data is often an uphill battle, and investment in building user capacity (both inside and outside of government) is scarce. Open data is not yet entrenched in law or policy, and the legal frameworks supporting most open data initiatives are weak. This is a symptom of the tendency of governments to view open data as a fad or experiment with little to no long-term strategy behind its implementation. This results in haphazard implementation, weak demand and limited impact.
    • The gap between data haves and have-nots needs urgent attention.Twenty-six of the top 30 countries in the ranking are high-income countries. Half of open datasets in our study are found in just the top 10 OECD countries, while almost none are in African countries. As the UN pointed out last year, such gaps could create “a whole new inequality frontier” if allowed to persist. Open data champions in several developing countries have launched fledgling initiatives, but too often those good open data intentions are not adequately resourced, resulting in weak momentum and limited success.
    • Governments at the top of the Barometer are being challenged by a new generation of open data adopters. Traditional open data stalwarts such as the USA and UK have seen their rate of progress on open data slow, signalling that new political will and momentum may be needed as more difficult elements of open data are tackled. Fortunately, a new generation of open data adopters, including France, Canada, Mexico, Uruguay, South Korea and the Philippines, are starting to challenge the ranking leaders and are adopting a leadership attitude in their respective regions. The International Open Data Charter could be an important vehicle to sustain and increase momentum in challenger countries, while also stimulating renewed energy in traditional open data leaders….(More)”

Tag monitors air pollution and never loses charge


Springwise: “The battle to clean up the air of major cities is well underway, with businesses and politicians pledging to help with the pollution issue. We have seen projects using mobile air sensors mounted on pigeons to bring the problem to public attention, and now a new crowdsourcing campaign is attempting to map the UK’s air pollution.

CleanSpace uses a portable, air pollution-sensing tag to track exposure to harmful pollutants in real-time. The tag is connected to an app, which analyzes and combines the data to that of other users in the UK to create an air pollution map.

An interesting part of the CleanSpace Tag’s technology is the fact it never needs to be charged. The startup say the tag is powered by harvesting 2G, 3G, 4G and wifi signals, which keep its small power requirements filled. The app also rewards users for traveling on-foot or by bike, offering them “CleanMiles” that can be exchanged for discounts with the CleanSpace’s partners.

The startup successfully raised more than GBP 100,000 in a crowdfunding campaign last year, and the team has given back GBP 10,000 to their charitable partners this year. …(More)”

How to See Gentrification Coming


Nathan Collins at Pacific Standard: “Depending on whom you ask, gentrification is either damaging, not so bad, or maybe even good for the low-income people who live in what we euphemistically call up-and-coming neighborhoods. Either way, it’d be nice for everybody to know which neighborhoods are going to get revitalized/eviscerated next. Now, computer scientists think they’ve found a way to do exactly that: Using Twitter and Foursquare, map the places visited by the most socially diverse crowds. Those, it turns out, are the most likely to gentrify.

Led by University of Cambridge graduate student Desislava Hristova, the researchers began their study by mapping out the social network of 37,722 Londoners who posted Foursquare check-ins via Twitter. Two people were presumed to be friends—connected on the social network—if they followed each other’s Twitter feeds. Next, Hristova and her colleagues built a geographical network of 42,080 restaurants, clubs, shops, apartments, and so on. Quaint though it may seem, the researchers treated two places as neighbors in the geographical network if they were, in fact, physically near each other. The team then linked the social and geographical networks using 549,797 Foursquare check-ins, each of which ties a person in the social network to a place in the geographical one.

Gentrification doesn’t start when outsiders move in; it starts when outsiders come to visit.

Using the network data, the team next constructed several measures of the social diversity of places, each of which helps distinguish between places that bring together friends versus strangers, and to distinguish between spots that attract socially diverse crowds versus a steady group of regulars. Among other things, those measures showed that places in the outer boroughs of London brought together more socially homogenous groups of people—in terms of their Foursquare check-ins, at least—compared with boroughs closer to the core.

But the real question is what social diversity has to do with gentrification. To measure that, the team used the United Kingdom’s Index of Multiple Deprivation, which takes into account income, education, environmental factors such as air quality, and more to quantify the socioeconomic state of affairs in localities across the U.K., including each of London’s 32 boroughs.

The rough pattern, according to the analysis: The most socially diverse places in London were also the most deprived. This is about the opposite of what you’d expect, based on social networks studied in isolation from geography, which indicates that, generally, the people with the most diverse social networks are the most prosperous….(More)”

Open Data and Beyond


Paper by Frederika Welle Donker, Bastiaan van Loenen and Arnold K. Bregt: “In recent years, there has been an increasing trend of releasing public sector information as open data. Governments worldwide see the potential benefits of opening up their data. The potential benefits are more transparency, increased governmental efficiency and effectiveness, and external benefits, including societal and economic benefits. The private sector also recognizes potential benefits of making their datasets available as open data. One such company is Liander, an energy network administrator in the Netherlands. Liander views open data as a contributing factor to energy conservation. However, to date there has been little research done into the actual effects of open data. This research has developed a monitoring framework to assess the effects of open data, and has applied the framework to Liander’s small-scale energy consumption dataset….(More)

Big Data in the Public Sector


Chapter by Ricard Munné in New Horizons for a Data-Driven Economy: “The public sector is becoming increasingly aware of the potential value to be gained from big data, as governments generate and collect vast quantities of data through their everyday activities.

The benefits of big data in the public sector can be grouped into three major areas, based on a classification of the types of benefits: advanced analytics, through automated algorithms; improvements in effectiveness, providing greater internal transparency; improvements in efficiency, where better services can be provided based on the personalization of services; and learning from the performance of such services.

The chapter examined several drivers and constraints that have been identified, which can boost or stop the development of big data in the sector depending on how they are addressed. The findings, after analysing the requirements and the technologies currently available, show that there are open research questions to be addressed in order to develop such technologies so competitive and effective solutions can be built. The main developments are required in the fields of scalability of data analysis, pattern discovery, and real-time applications. Also required are improvements in provenance for the sharing and integration of data from the public sector. It is also extremely important to provide integrated security and privacy mechanisms in big data applications, as public sector collects vast amounts of sensitive data. Finally, respecting the privacy of citizens is a mandatory obligation in the European Union….(More)”