The ‘data revolution’ will be open


Martin Tisne at Devex: “There is a huge amount of talk about a “data revolution.” The phrase emerged in the years preceding this September’s announcement of the Sustainable Development Goals, and has recently been strongly reaffirmed by the launch of a Global Partnership on Sustainable Development Data.

The importance of data in measuring, assessing and verifying the new SDGs has been powerfully made and usually includes a mention of the data needing to be “open.” However, the role of “open” has not been clearly articulated. Fundamentally, the discussion focuses on the role of data (statistics, for example) in decision-making, and not on the benefits of that data being open to the public. Until this case is made, difficult decisions to make data open will go by the wayside.

Much of the debate justly focuses on why data matters for decision-making. Knowing how many boys and girls are in primary and secondary schools, how good their education is, and the number of teachers in their schools, are examples of relevant data used in shaping education delivery, and perhaps policy. Likewise, new satellite and cellphone data can help us prevent and understand the causes of death by HIV and AIDS, tuberculosis, and malaria.

Proponents of the data revolution make powerful points, such as that 1 in 3 births go unregistered. If you are uncounted, you will be ignored. If you don’t have an identity, you do not exist.

Yet as important as this information is, I still can’t help but think: Do we change the course of history with the mere existence of more data or because people access it, mobilize and press for change?

We need an equally eloquent narrative for why open data matters and what it means.

To my thinking, we need the data to be open because we need to hold governments accountable for their promises under the SDGs, in order to incentivize action. The data needs to be available, accessible and comparable to enable journalists and civil society to prod, push and test the validity of these promises. After all, what good are the goals if governments do not deliver, beginning with the funding to implement? We will need to know what financial resources, both public and private, will be put to work and what budget allocations governments will make in their draft budgets. We need to have those debates in the open, not in smoke-filled rooms.

Second, the data needs to be open in order to be verified, quality-checked and improved. …(More)”

Creating Value through Open Data


Press Release: “Capgemini Consulting, the global strategy and transformation consulting arm of the Capgemini Group, today published two new reports on the state of play of Open Data in Europe, to mark the launch of the European Open Data Portal. The first report addresses “Open Data Maturity in Europe 2015: Insights into the European state of play” and the second focuses on “Creating Value through Open Data: Study on the Impact of Re-use of Public Data Resources.” The countries covered by these assessments include the EU28 countries plus Iceland, Liechtenstein, Norway, and Switzerland – commonly referred to as the EU28+ countries. The reports were requested by the European Commission within the framework of the Connecting Europe Facility program, supporting the deployment of European Open Data infrastructure.

Open Data refers to the information collected, produced or paid for by public bodies and can be freely used, modified and shared by anyone.. For the period 2016-2020, the direct market size for Open Data is estimated at EUR 325 billion for Europe. Capgemini’s study “Creating Value through Open Data” illustrates how Open Data can create economic value in multiple ways including increased market transactions, job creation from producing services and products based on Open Data, to cost savings and efficiency gains. For instance, effective use of Open Data could help save 629 million hours of unnecessary waiting time on the roads in the EU; and help reduce energy consumption by 16%. The accumulated cost savings for public administrations making use of Open Data across the EU28+ in 2020 are predicted to equal 1.7 bn EUR. Reaping these benefits requires reaching a high level of Open Data maturity.

In order to address the accessibility and the value of Open Data across European countries, the European Union has launched the Beta version of the European Data Portal. The Portal addresses the whole Data Value Chain, from data publishing to data re-use. Over 240,000 data sets are referenced on the Portal and 34 European countries. It offers seamless access to public data across Europe, with over 13 content categories to categorize data, ranging from health or education to transport or even science and justice. Anyone, citizens, businesses, journalists or administrations can search, access and re-use the full data collection. A wide range of data is available, from crime records in Helsinki, labor mobility in the Netherlands, forestry maps in France to the impact of digitization in Poland…..The study, “Open Data Maturity in Europe 2015: Insights into the European state of play”, uses two key indicators: Open Data Readiness and Portal Maturity. These indicators cover both the maturity of national policies supporting Open Data as well as an assessment of the features made available on national data portals. The study shows that the EU28+ have completed just 44% of the journey towards achieving full Open Data Maturity and there are large discrepancies across countries. A third of European countries (32%), recognized globally, are leading the way with solid policies, licensing norms, good portal traffic and many local initiatives and events to promote Open Data and its re-use….(More)”

Public Sector Data Management Project


Australian government: “Earlier in 2015, Michael Thawley, Secretary of the Department of the Prime Minister and Cabinet (PM&C), commissioned an in-house study into how public sector data can be better used to achieve efficiencies for government, enable better service delivery and properly be used by the private sector to stimulate economic activity…..

There are four commonly used classifications of data: personal data, research data, open data and security data. Each type of data is used for different purposes and requires a different set of considerations, as the graphic below illustrates. The project focused on how the Australian Public Service manages its research data and open data, while ensuring personal data was kept appropriately secured. Security data was beyond the scope of this project.

4 different types of data and their different purposes

The project found that there are pockets of excellence across the Australian Public Service, with some agencies actively working on projects that focus on a richer analysis of linked data. However, this approach is fragmented and is subject to a number of barriers, both perceived and real. These include cultural and legislative barriers, and a data analytics skills and capability shortage across the Australian Public Service.

To overcome these barriers, the project established a roadmap to make better use of public data, comprising an initial period to build confidence and momentum across the APS, and a longer term set of initiatives to systematise the use, publishing and sharing of public data.

The report is available from the link below: Public Sector Data Management Project

Open Data, Privacy, and Fair Information Principles: Towards a Balancing Framework


Paper by Zuiderveen Borgesius, Frederik J. and van Eechoud, Mireille and Gray, Jonathan: “Open data are held to contribute to a wide variety of social and political goals, including strengthening transparency, public participation and democratic accountability, promoting economic growth and innovation, and enabling greater public sector efficiency and cost savings. However, releasing government data that contain personal information may threaten privacy and related rights and interests. In this paper we ask how these privacy interests can be respected, without unduly hampering benefits from disclosing public sector information. We propose a balancing framework to help public authorities address this question in different contexts. The framework takes into account different levels of privacy risks for different types of data. It also separates decisions about access and re-use, and highlights a range of different disclosure routes. A circumstance catalogue lists factors that might be considered when assessing whether, under which conditions, and how a dataset can be released. While open data remains an important route for the publication of government information, we conclude that it is not the only route, and there must be clear and robust public interest arguments in order to justify the disclosure of personal information as open data….(More)

Freedom of Information, Right to Access Information, Open Data: Who is at the Table?


Elizabeth Shepherd in The Round Table: The Commonwealth Journal of International Affairs: “Many national governments have adopted the idea of the ‘right to access information’ (RTI) or ‘freedom of information’ (FOI) as an essential element of the rights of citizens to freedom of opinion and expression, human rights, trust in public discourse and transparent, accountable and open government. Over 100 countries worldwide have introduced access to information legislation: 50+ in Europe; a dozen in Africa; 20 in the Americas and Caribbean; more than 15 in Asia and the Pacific; and two in the Middle East (Banisar, 2014). This article will provide an overview of access to information legislation and focus on the UK Freedom of Information Act 2000 as a case example. It will discuss the impact of the UK FOI Act on public authorities, with particular attention to records management implications, drawing on research undertaken by University College London. In the final section, it will reflect on relationships between access to information and open government data. If governments are moving to more openness, what implications might this have for those charged with implementing FOI and RTI policies, including for records management professionals?…(More)”

Tech and Innovation to Re-engage Civic Life


Hollie Russon Gilman at the Stanford Social Innovation Review: “Sometimes even the best-intentioned policymakers overlook the power of people. And even the best-intentioned discussions on social impact and leveraging big data for the social sector can obscure the power of every-day people in their communities.

But time and time again, I’ve seen the transformative power of civic engagement when initiatives are structured well. For example, the other year I witnessed a high school student walk into a school auditorium one evening during Boston’s first-ever youth-driven participatory budgeting project. Participatory budgeting gives residents a structured opportunity to work together to identify neighborhood priorities, work in tandem with government officials to draft viable projects, and prioritize projects to fund. Elected officials in turn pledge to implement these projects and are held accountable to their constituents. Initially intrigued by an experiment in democracy (and maybe the free pizza), this student remained engaged over several months, because she met new members of her community; got to interact with elected officials; and felt like she was working on a concrete objective that could have a tangible, positive impact on her neighborhood.

For many of the young participants, ages 12-25, being part of a participatory budgeting initiative is the first time they are involved in civic life. Many were excited that the City of Boston, in collaboration with the nonprofit Participatory Budgeting Project, empowered young people with the opportunity to allocate $1 million in public funds. Through participating, young people gain invaluable civic skills, and sometimes even a passion that can fuel other engagements in civic and communal life.

This is just one example of a broader civic and social innovation trend. Across the globe, people are working together with their communities to solve seemingly intractable problems, but as diverse as those efforts are, there are also commonalities. Well-structured civic engagement creates the space and provides the tools for people to exert agency over policies. When citizens have concrete objectives, access to necessary technology (whether it’s postcards, trucks, or open data portals), and an eye toward outcomes, social change happens.

Using Technology to Distribute Expertise

Technology is allowing citizens around the world to participate in solving local, national, and global problems. When it comes to large, public bureaucracies, expertise is largely top-down and concentrated. Leveraging technology creates opportunities for people to work together in new ways to solve public problems. One way is through civic crowdfunding platforms like Citizinvestor.com, which cities can use to develop public sector projects for citizen support; several cities in Rhode Island, Oregon, and Philadelphia have successfully pooled citizen resources to fund new public works. Another way is through citizen science. Old Weather, a crowdsourcing project from the National Archives and Zooniverse, enrolls people to transcribe old British ship logs to identify climate change patterns. Platforms like these allow anyone to devote a small amount of time or resources toward a broader public good. And because they have a degree of transparency, people can see the progress and impact of their efforts. ….(More)”

Do We Need to Educate Open Data Users?


Tony Hirst at IODC: “Whilst promoting the publication of open data is a key, indeed necessary, ingredient in driving the global open data agenda, promoting initiatives that support the use of open data is perhaps an even more pressing need….

This, then, is the first issue we need to address: improving basic levels of literacy in interpreting  – and manipulating (for example, sorting and grouping) – simple tables and charts. Sensemaking, in other words: what does the chart you’ve just produced actually say? What story does it tell? And there’s an added benefit that arises from learning to read and critique charts better – it makes you better at creating your own.

Associated with reading stories from data comes the reason for telling the story and putting the data to work. How does “data” help you make a decision, or track the impact of a particular intervention? (Your original question should also have informed the data you searched for in the first place). Here we have a need to develop basic skills in how to actually use data, from finding anomalies to hold publishers to account, to using the data as part of a positive advocacy campaign.

After a quick read, on site, of some of the stories the data might have to tell, there may be a need to do further analysis, or more elaborate visualization work. At this point, a range of technical craft skills often come into play, as well as statistical knowledge.

Many openly published datasets just aren’t that good – they’re “dirty”, full of misspellings, missing data, things in the wrong place or wrong format, even if the data they do contain is true. A significant amount of time that should be spent analyzing the data gets spent trying to clean the data set and get it into a form where it can be worked with. I would argue here that a data technician, with a wealth of craft knowledge about how to repair what is essentially a broken dataset, can play an important timesaving role here getting data into a state where an analyst can actually start to do their job analyzing the data.

But at the same time, there are a range of tools and techniques that can help the everyday user improve the quality of their data. Many of these tools require an element of programming knowledge, but less than you might at first think. In the Open University/FutureLean MOOC “Learn to Code for Data Analysis” we use an interactive notebook style of computing to show how you can use code literally one line at a time to perform powerful data cleaning, analysis, and visualization operations on a range of open datasets, including data from the World Bank and Comtrade.

Here, then, is yet another area where skills development may be required: statistical literacy. At its heart, statistics simply provide us with a range of tools for comparing sets of numbers. But knowing what comparisons to make, or the basis on which particular comparisons can be made, knowing what can be said about those comparisons or how they might be interpreted, in short, understanding what story the stats appear to be telling, can quickly become bewildering. Just as we need to improve sensemaking skills associated with reading charts, so to we need to develop skills in making sense of statistics, even if not actually producing those statistics ourselves.

As more data gets published, there are more opportunities for more people to make use of that data. In many cases, what’s likely to hold back that final data use is a skills gap: primary among these are the skills required to interpret simple datasets and the statistics associated with them associated with developing knowledge about how to make decisions or track progress based on that interpretation. However, the path to producing the statistics or visualizations used by the end-users from the originally published open data dataset may also be a windy one, requiring skills not only in analyzing data and uncovering – and then telling – the stories it contains, but also in more mundane technical operational concerns such as actually accessing, and cleaning, dirty datasets….(More)”

Open Government: Missing Questions


Vadym Pyrozhenko at Administration & Society: “This article places the Obama administration’s open government initiative within the context of evolution of the U.S. information society. It examines the concept of openness along the three dimensions of Daniel Bell’s social analysis of the postindustrial society: structure, polity, and culture. Four “missing questions” raise the challenge of the compatibility of public service values with the culture of openness, address the right balance between postindustrial information management practices and the capacity of public organizations to accomplish their missions, and ask to reconsider the idea that greater structural openness of public organizations will necessarily increase their democratic legitimacy….(More)”

 

Open government data: Out of the box


The Economist on “The open-data revolution has not lived up to expectations. But it is only getting started…

The app that helped save Mr Rich’s leg is one of many that incorporate government data—in this case, supplied by four health agencies. Six years ago America became the first country to make all data collected by its government “open by default”, except for personal information and that related to national security. Almost 200,000 datasets from 170 outfits have been posted on the data.gov website. Nearly 70 other countries have also made their data available: mostly rich, well-governed ones, but also a few that are not, such as India (see chart). The Open Knowledge Foundation, a London-based group, reckons that over 1m datasets have been published on open-data portals using its CKAN software, developed in 2010.

Jakarta’s Participatory Budget


Ramda Yanurzha in GovInsider: “…This is a map of Musrenbang 2014 in Jakarta. Red is a no-go, green means the proposal is approved.

To give you a brief background, musrenbang is Indonesia’s flavor of participatory, bottom-up budgeting. The idea is that people can propose any development for their neighbourhood through a multi-stage budgeting process, thus actively participating in shaping the final budget for the city level, which will then determine the allocation for each city at the provincial level, and so on.

The catch is, I’m confident enough to say that not many people (especially in big cities) are actually aware of this process. While civic activists tirelessly lament that the process itself is neither inclusive nor transparent, I’m leaning towards a simpler explanation that most people simply couldn’t connect the dots.

People know that the public works agency fixed that 3-foot pothole last week. But it’s less clear how they can determine who is responsible for fixing a new streetlight in that dark alley and where the money comes from. Someone might have complain to the neighbourhood leader (Pak RT) and somehow the message gets through, but it’s very hard to trace how it got through. Just keep complaining to the black box until you don’t have to. There are very few people (mainly researchers) who get to see the whole picture.

This has now changed because the brand-new Jakarta open data portal provides musrenbang data from 2009. Who proposed what to whom, for how much, where it should be implemented (geotagged!), down to kelurahan/village level, and whether the proposal is accepted into the final city budget. For someone who advocates for better availability of open data in Indonesia and is eager to practice my data wrangling skill, it’s a goldmine.

Diving In

data screenshot
All the different units of goods proposed.

The data is also, as expected, incredibly messy. While surprisingly most of the projects proposed are geotagged, there are a lot of formatting inconsistencies that makes the clean up stage painful. Some of them are minor (m? meter? meter2? m2? meter persegi?) while others are perplexing (latitude: -6,547,843,512,000  –  yes, that’s a value of more than a billion). Annoyingly, hundreds of proposals point to the center of the National Monument so it’s not exactly a representative dataset.

For fellow data wranglers, pull requests to improve the data are gladly welcome over here. Ibam generously wrote an RT extractor to yield further location data, and I’m looking into OpenStreetMap RW boundary data to create a reverse geocoder for the points.

A couple hours of scrubbing in OpenRefine yields me a dataset that is clean enough for me to generate the CartoDB map I embedded at the beginning of this piece. More precisely, it is a map of geotagged projects where each point is colored depending on whether it’s rejected or accepted.

Numbers and Patterns

40,511 proposals, some of them merged into broader ones, which gives us a grand total of 26,364 projects valued at over IDR 3,852,162,060,205, just over $250 million at the current exchange rate. This amount represents over 5% of Jakarta’s annual budget for 2015, with projects ranging from a IDR 27,500 (~$2) trash bin (that doesn’t sound right, does it?) in Sumur Batu to IDR 54 billion, 1.5 kilometer drainage improvement in Koja….(More)”