The impact of Open Data


GovLab/Omidyar Network: “…share insights gained from our current collaboration with Omidyar Network on a series of open data case studies. These case studies – 19, in total – are designed to provide a detailed examination of the various ways open data is being used around the world, across geographies and sectors, and to draw some over-arching lessons. The case studies are built from extensive research, including in-depth interviews with key participants in the various open data projects under study….

Ways in which open data impacts lives

Broadly, we have identified four main ways in which open data is transforming economic, social, cultural and political life, and hence improving people’s lives.

  • First, open data is improving government, primarily by helping tackle corruption, improving transparency, and enhancing public services and resource allocation.
  • Open data is also empowering citizens to take control of their lives and demand change; this dimension of impact is mediated by more informed decision making and new forms of social mobilization, both facilitated by new ways of communicating and accessing information.
  • Open data is also creating new opportunities for citizens and groups, by stimulating innovation and promoting economic growth and development.
  • Finally, open data is playing an increasingly important role insolving big public problems, primarily by allowing citizens and policymakers to engage in new forms of data-driven assessment and data-driven engagement.

 

Enabling Conditions

While these are the four main ways in which open data is driving change, we have seen wide variability in the amount and nature of impact across our case studies. Put simply, some projects are more successful than others; or some projects might be more successful in a particular dimension of impact, and less successful in others.

As part of our research, we have therefore tried to identify some enabling conditions that maximize the positive impact of open data projects. These four stand out:

  • Open data projects are most successful when they are built not from the efforts of single organizations or government agencies, but when they emerge from partnerships across sectors (and even borders). The role of intermediaries (e.g., the media and civil society groups) and “data collaboratives” are particularly important.
  • Several of the projects we have seen have emerged on the back of what we might think of as an open data public infrastructure– i.e., the technical backend and organizational processes necessary to enable the regular release of potentially impactful data to the public.
  • Clear open data policies, including well-defined performance metrics, are also essential; policymakers and political leaders have an important role in creating an enabling (yet flexible) legal environment that includes mechanisms for project assessments and accountability, as well as providing the type high-level political buy-in that can empower practitioners to work with open data.
  • We have also seen that the most successful open data projects tend to be those that target a well-defined problem or issue. In other words, projects with maximum impact often meet a genuine citizen need.

 

Challenges

Impact is also determined by the obstacles and challenges that a project confronts. Some regions and some projects face a greater number of hurdles. These also vary, but we have found four challenges that appear most often in our case studies:

  • Projects in countries or regions with low capacity or “readiness”(indicated, for instance by low Internet penetration rates or hostile political environments) typically fare less well.
  • Projects that are unresponsive to feedback and user needs are less likely to succeed than those that are flexible and able to adapt to what their users want.
  • Open data often exists in tension with risks such as privacy and security; often, the impact of a project is limited or harmed when it fails to take into account and mitigate these risks.
  • Although open data projects are often “hackable” and cheap to get off the ground, the most successful do require investments – of time and money – after their launch; inadequate resource allocation is one of the most common reasons for a project to fail.

These lists of impacts, enabling factors and challenges are, of course, preliminary. We continue to refine our research and will include a final set of findings along with our final report….(More)

5 tech trends that will transform governments


Zac Bookman at the World Economic Forum: “…The public sector today looks a bit like the consumer industry of 1995 and the enterprise space in 2005: it is at the beginning of a large-scale digital metamorphosis. The net result will be years of saved time, better decisions and stronger communities.

Here are five trends that will define this transformation in the coming decade:

  1. Real-time operations

Many industries in the global economy already operate in real time. ….

Governments are different. They often access accurate data only on a monthly or quarterly basis, even though they make critical decisions every day. This will change with software deployments that help governments unleash and use current data to make more informed decisions about how they can allocate public resources effectively.

  1. Smarter cities  

Studies on human migration patterns indicate that more people are moving to cities. By 2025, an estimated 60% of the world’s population will live in an urban centre. High rates of urbanization will force cities to use their existing resources more efficiently. Networked infrastructures – including roads, phone lines, cable networks, satellites and the internet – will be important parts of the solution to this challenge….For example, MIT and Copenhagen recently collaborated on an electric-hybrid bike wheel that monitors pollution, road conditions and traffic. The wheel allows cities to monitor their environments at a level that was previously unfeasible with cheap sensors and manual labour, offering a quantum leap in networking capability without using further human or capital resources.

  1. Increased citizen engagement

Smart networks are wonderful things, but cities need to guard themselves against making efficiency a sacred cow. There is inherent tension between the ideals of democracy and efficiency, between the openness of platforms that encourage engagement and centralized systems. Rather than focus solely on making everything smart, cities will have to focus on slowing down and improving the quality of life.

These considerations will cause cities to increase citizen engagement. Transparency is a subset of this goal. Open data platforms, such as data.gov and data.gov.uk, host troves of machine-readable government information that allow communities to target and solve problems for which governments do not have the bandwidth. Crowdfunding platforms, such as neighbor.ly, allow citizens to participate in the civic process by enabling them to invest in local capital projects. These types of civic tech platforms will continue to grow, and they will be vital to the health of future democracies.

  1. 21st-century reporting software for governments

The information technology that powers government is notoriously antiquated. …

New reporting technology, such as the system from OpenGov, will automatically pull and display data from governments’ accounting systems. These capabilities empower employees to find information in seconds that would have previously taken hours, days or even weeks to find. They will expand inter-departmental collaboration on core functions, such as budgeting. And they will also allow governments to compare themselves with other governments. In the next decade, advanced reporting software will save billions of dollars by streamlining processes, improving decisions and offering intelligent insights across the expenditure spectrum.

  1. Inter-governmental communication

The internet was conceived as a knowledge-sharing platform. Over the past few decades, technologists have developed tools such as Google and Wikipedia to aid the flow of information on the web and enable ever greater knowledge sharing. Today, you can find nearly any piece of information in a matter of seconds. Governments, however, have not benefited from the rapid development of such tools for their industry, and most information sharing still occurs offline, over email, or on small chat forums. Tools designed specifically for government data will allow governments to embrace the inherent knowledge-sharing infrastructure of the internet….(More)”

A data revolution is underway. Will NGOs miss the boat?


Opinion by Sophia Ayele at Oxfam: “The data revolution has arrived. ….The UN has even launched a Data Revolution Group (to ensure that the revolution penetrates into international development). The Group’s 2014 report suggests that harnessing the power of newly available data could ultimately lead to, “more empowered people, better policies, better decisions and greater participation and accountability, leading to better outcomes for people and the planet.”

But where do NGOs fit in?

NGOs are generating dozens (if not hundreds) of datasets every year. Over the last two decades, NGO have been collecting increasing amounts of research and evaluation data, largely driven by donor demands for more rigorous evaluations of programs. The quality and efficiency of data collection has also been enhanced by mobile data collection. However, a quick scan of UK development NGOs reveals that few, if any, are sharing the data that they collect. This means that NGOs are generating dozens (if not hundreds) of datasets every year that aren’t being fully exploited and analysed. Working on tight budgets, with limited capacity, it’s not surprising that NGOs often shy away from sharing data without a clear mandate.

But change is in the air. Several donors have begun requiring NGOs to publicise data and others appear to be moving in that direction. Last year, USAID launched its Open Data Policy which requires that grantees “submit any dataset created or collected with USAID funding…” Not only does USAID stipulate this requirement, it also hosts this data on its Development Data Library (DDL) and provides guidance on anonymisation to depositors. Similarly, Gates Foundation’s 2015 Open Access Policy stipulates that, “Data underlying published research results will be accessible and open immediately.” However, they are allowing a two-year transition period…..Here at Oxfam, we have been exploring ways to begin sharing research and evaluation data. We aren’t being required to do this – yet – but, we realise that the data that we collect is a public good with the potential to improve lives through more effective development programmes and to raise the voices of those with whom we work. Moreover, organizations like Oxfam can play a crucial role in highlighting issues facing women and other marginalized communities that aren’t always captured in national statistics. Sharing data is also good practice and would increase our transparency and accountability as an organization.

… the data that we collect is a public good with the potential to improve lives. However, Oxfam also bears a huge responsibility to protect the rights of the communities that we work with. This involves ensuring informed consent when gathering data, so that communities are fully aware that their data may be shared, and de-identifying data to a level where individuals and households cannot be easily identified.

As Oxfam has outlined in our, recently adopted, Responsible Data Policy,”Using data responsibly is not just an issue of technical security and encryption but also of safeguarding the rights of people to be counted and heard, ensuring their dignity, respect and privacy, enabling them to make an informed decision and protecting their right to not be put at risk… (More)”

The Art of Managing Complex Collaborations


Eric Knight, Joel Cutcher-Gershenfeld, and Barbara Mittleman at MIT Sloan Management Review: “It’s not easy for stakeholders with widely varying interests to collaborate effectively in a consortium. The experience of the Biomarkers Consortium offers five lessons on how to successfully navigate the challenges that arise….

Society’s biggest challenges are also its most complex. From shared economic growth to personalized medicine to global climate change, few of our most pressing problems are likely to have simple solutions. Perhaps the only way to make progress on these and other challenges is by bringing together the important stakeholders on a given issue to pursue common interests and resolve points of conflict.

However, it is not easy to assemble such groups or to keep them together. Many initiatives have stumbled and disbanded. The Biomarkers Consortium might have been one of them, but this consortium beat the odds, in large part due to the founding parties’ determination to make it work. Nine years after it was founded, this public-private partnership, which is managed by the Foundation for the National Institutes of Health and based in Bethesda, Maryland, is still working to advance the availability of biomarkers (biological indicators for disease states) as tools for drug development, including applications at the frontiers of personalized medicine.

The Biomarkers Consortium’s mandate — to bring together, in the group’s words, “the expertise and resources of various partners to rapidly identify, develop, and qualify potential high-impact biomarkers particularly to enable improvements in drug development, clinical care, and regulatory decision-making” — may look simple. However, the reality has been quite complex. The negotiations that led to the consortium’s formation in 2006 were complicated, and the subsequent balancing of common and competing interests remains challenging….

Many in the biomedical sector had seen the need to tackle drug discovery costs for a long time, with multiple companies concurrently spending millions, sometimes billions, of dollars only to hit common dead ends in the drug development process. In 2004 and 2005, then National Institutes of Health director Elias Zerhouni convened key people from the U.S. Food and Drug Administration, the NIH, and the Pharmaceutical Research and Manufacturers of America to create a multistakeholder forum.

Every member knew from the outset that their fellow stakeholders represented many divergent and sometimes opposing interests: large pharmaceutical companies, smaller entrepreneurial biotechnology companies, FDA regulators, NIH science and policy experts, university researchers and nonprofit patient advocacy organizations….(More)”

Beyond the Jailhouse Cell: How Data Can Inform Fairer Justice Policies


Alexis Farmer at DataDrivenDetroit: “Government-provided open data is a value-added approach to providing transparency, analytic insights for government efficiency, innovative solutions for products and services, and increased civic participation. Two of the least transparent public institutions are jails and prisons. The majority of population has limited knowledge about jail and prison operations and the demographics of the jail and prison population, even though the costs of incarceration are substantial. The absence of public knowledge about one of the many establishments public tax dollars support can be resolved with an open data approach to criminal justice. Increasing access to administrative jail information enables communities to collectively and effectively find solutions to the challenges the system faces….

The data analysis that compliments open data practices is a part of the formula for creating transformational policies. There are numerous ways that recording and publishing data about jail operations can inform better policies and practices:

1. Better budgeting and allocation of funds. By monitoring the rate at which dollars are expended for a specific function, data allows for administrations to ensure accurate estimates of future expenditures.

2. More effective deployment of staff. Knowing the average daily population and annual average bookings can help inform staffing decisions to determine a total need of officers, shift responsibilities, and room arrangements. The population information also helps with facility planning, reducing overcrowding, controlling violence within the facility, staffing, determining appropriate programs and services, and policy and procedure development.

3. Program participation and effectiveness. Gauging the amount of inmates involved in jail work programs, educational training services, rehabilitation/detox programs, and the like is critical to evaluating methods to improve and expand such services. Quantifying participation and effectiveness of these programs can potentially lead to a shift in jail rehabilitating services.

4. Jail suicides. “The rate of jail suicides is about three times the rate of prison suicides.” Jails are isolating spaces that separate inmates from social support networks, diminish personal control, and often lack mental health resources. Most people in jail face minor charges and spend less time incarcerated due to shorter sentences. Reviewing the previous jail suicide statistics aids in pinpointing suicide risk, identifying high-risk groups, and ultimately, prescribing intervention procedures and best practices to end jail suicides.

5. Gender and race inequities. It is well known that Black men are disproportionately incarcerated, and the number of Black women in jails and prisons has rapidly increased . It is important to view this disparity as it reflects to the demographics of the total population of an area. Providing data that show trends in particular crimes committed by race and gender data might lead to further analysis and policy changes in the root causes of these crimes (poverty, employment, education, housing, etc.).

6. Prior interaction with the juvenile justice system. The school-to-prison pipeline describes the systematic school discipline policies that increase a student’s interaction with the juvenile justice system. Knowing how many incarcerated persons that have been suspended, expelled, or incarcerated as a juvenile can encourage schools to examine their discipline policies and institute more restorative justice programs for students. It would also encourage transitional programs for formerly incarcerated youth in order to decrease recidivism rate among young people.

7. Sentencing reforms. Evaluating the charges on which a person is arrested, the length of stay, average length of sentences, charges for which sentences are given, and the length of time from the first appearance to arraignment and trial disposition can inform more just and balanced sentencing laws enforced by the judicial branch….(More)”

Open Data: A 21st Century Asset for Small and Medium Sized Enterprises


“The economic and social potential of open data is widely acknowledged. In particular, the business opportunities have received much attention. But for all the excitement, we still know very little about how and under what conditions open data really works.

To broaden our understanding of the use and impact of open data, the GovLab has a variety of initiatives and studies underway. Today, we share publicly our findings on how Small and Medium Sized Enterprises (SMEs) are leveraging open data for a variety of purposes. Our paper “Open Data: A 21st Century Asset for Small and Medium Sized Enterprises” seeks to build a portrait of the lifecycle of open data—how it is collected, stored and used. It outlines some of the most important parameters of an open data business model for SMEs….

The paper analyzes ten aspects of open data and establishes ten principles for its effective use by SMEs. Taken together, these offer a roadmap for any SME considering greater use or adoption of open data in its business.

Among the key findings included in the paper:

  • SMEs, which often lack access to data or sophisticated analytical tools to process large datasets, are likely to be one of the chief beneficiaries of open data.
  • Government data is the main category of open data being used by SMEs. A number of SMEs are also using open scientific and shared corporate data.
  • Open data is used primarily to serve the Business-to-Business (B2B) markets, followed by the Business-to-Consumer (B2C) markets. A number of the companies studied serve two or three market segments simultaneously.
  • Open data is usually a free resource, but SMEs are monetizing their open-data-driven services to build viable businesses. The most common revenue models include subscription-based services, advertising, fees for products and services, freemium models, licensing fees, lead generation and philanthropic grants.
  • The most significant challenges SMEs face in using open data include those concerning data quality and consistency, insufficient financial and human resources, and issues surrounding privacy.

This is just a sampling of findings and observations. The paper includes a number of additional observations concerning business and revenue models, product development, customer acquisition, and other subjects of relevance to any company considering an open data strategy.”

5 Tips for Designing a Data for Good Initiative


Mitul Desai at Mastercard Center for Inclusive Growth: “The transformative impact of data on development projects, captured in the hashtag #DATARevolution, offers the social and private sectors alike a rallying point to enlist data in the service of high-impact development initiatives.

To help organizations design initiatives that are authentic to their identity and capabilities, we’re sharing what’s necessary to navigate the deeply interconnected organizational, technical and ethical aspects of creating a Data for Good initiative.

1) Define the need

At the center of a Data for Good initiative are the individual beneficiaries you are seeking to serve. This is foundation on which the “Good” of Data for Good rests.

Understanding the data and expertise needed to better serve such individuals will bring into focus the areas where your organization can contribute and the partners you might engage. As we’ve covered in past posts, collaboration between agents who bring different layers of expertise to Data for Good projects is a powerful formula for change….

2) Understand what data can make a difference

Think about what kind of data can tell a story that’s relevant to your mission. Claudia Perlich of Dstillery says: “The question is first and foremost, what decision do I have to make and which data can tell me something about that decision.” This great introduction to what different kinds of data are relevant in different settings can give you concrete examples.

3) Get the right tools for the job

By one estimate, some 90% of business-relevant data are unstructured or semi-structured (think texts, tweets, images, audio) as opposed to structured data like numbers that easily fit into the lines of a spreadsheet. Perlich notes that while it’s more challenging to mine this unstructured data, they can yield especially powerful insights with the right tools—which thankfully aren’t that hard to identify…..

4) Build a case that moves your organization

“While our programs are designed to serve organizations no matter what their capacity, we do find that an organization’s clarity around mission and commitment to using data to drive decision-making are two factors that can make or break a project,” says Jake Porway, founder and executive director of DataKind, a New York-based data science nonprofit that helps organizations develop Data for Good initiatives…..

5) Make technology serve people-centric ethics

The two most critical ethical factors to consider are informed consent and privacy—both require engaging the community you wish to serve as individual actors….

“Employ data-privacy walls, mask the data from the point of collection and encrypt the data you store. Ensure that appropriate technical and organizational safeguards are in place to verify that the data can’t be used to identify individuals or target demographics in a way that could harm them,” recommends Quid’s Pedraza. To understand the technology of data encryption and masking, check out this post. (More)”

How understanding the ‘shape’ of data could change our world


Gurjeet Singh at the WEF: “We live in an extraordinary time. The capacity to generate and to store data has reached dizzying proportions. What lies within that data represents the chance for this generation to solve its most pressing problems – from disease and climate change, to healthcare and customer understanding.

The magnitude of the opportunity is defined by the magnitude of the data that is created – and it is astonishing….

Despite the technical advances in collection and storage, knowledge generation lags. This is a function of how organizations approach their data, how they conduct analyses, how they automate learning through machine intelligence.

At its heart, it is a mathematical problem. For any dataset the total number of possible hypotheses/queries is exponential in the size of the data. Exponential functions are difficult enough for humans to comprehend; however, to further complicate matters, the size of the data itself is growing exponentially, and is about to hit another inflection point as the Internet of Things kicks in.


What that means is that we are facing double exponential growth in the number of questions that we can ask of our data. If we choose the same approaches that have served us over time – iteratively asking questions of the data until we get the right answer – we will have lost out on opportunity to grasp our generational opportunity.

There are not, and will not ever be enough data scientists in the world to be successful in this approach. We cannot arm enough citizen data scientists with new software to be successful in this approach. Software that makes question asking or hypothesis development more accessible or more efficient miss the central premise that they will only fall further behind as new data becomes available each millisecond.

To truly unlock the value that lies within our data we need to turn our attention to the data, setting aside the questions for later. This too, turns out to be a mathematical problem. Data, it turns out, has shape. That shape has meaning. The shape of data tells you everything you need to know about your data from its obvious features to its secret secrets.

We understand that regression produces lines.

data2

We know that customer segmentation produces groups.

data3

We know that economic growth and interest rates have a cyclical nature (diseases like malaria have this shape too).

data4

By knowing the shape and where we are in the shape, we vastly improve our understanding of where we are, where we have been and perhaps more importantly, what might happen next. In understanding the shape of data we understand every feature of the dataset, immediately grasping what it is important in the data, thus dramatically reducing the number of questions to ask and accelerating the discovery process.

By changing our thinking – and starting with the shape of the data, not a series of questions (which very often come with significant biases) – we can extract knowledge from these rapidly growing, massive and complex datasets.

The knowledge that lies hidden within electronic medical records, billing records and clinical records is enough to transform how we deliver healthcare and how we treat diseases. The knowledge that lies within the massive data stores of governments, universities and other institutions will illuminate the conversation on climate change and point the way to answers on what we need to do to protect the planet for future generations. The knowledge that is obscured by web, transaction, CRM, social and other data will inform a clearer, more meaningful picture of the customer and will, in turn define the optimal way to interact.

This is the opportunity for our generation to turn data into knowledge. To get there will require a different approach, but one with the ability to impact the entirety of humankind….(More)

IMF Publishes Worldwide Government Revenue Database


IMF Press Release: “The IMF today published for the first time the World Revenue Longitudinal Dataset (WoRLD), which provides data on tax and non-tax revenues for 186 countries over the period 1990-2013. The database includes broad country coverage and time periods, and it is the result of combining in a consistent manner data from two other IMF publications — the IMF Government Finance Statistics and World Economic Outlook (WEO)– and drawing on the OECD’s Revenue Statistics and Revenue Statistics in Latin America and the Caribbean.

Vitor Gaspar, Director of the IMF’s Fiscal Affairs Department, said the purpose of releasing the database for general use is to “encourage and facilitate informed discussion and analysis of tax policy and administration for the full range of countries, the need for which was highlighted most recently during the Financing for Development conference in Addis Ababa.”

Constructing the database was a challenging exercise. An accompanying background note will be released in the coming weeks to explain the methodology. The database will be updated annually and will include information from IMF staff reports.

The database is available for download free of charge on the IMF e-Library data portal (http://data.imf.org/revenues).”

 

The Data Divide: What We Want and What We Can Get


Craig Adelman and Erin Austin at Living Cities (Read Blog 1):There is no shortage of data. At every level–federal, state, county, city and even within our own organizations–we are collecting and trying to make use of data. Data is a catch-all term that suggests universal access and easy use. The problem? In reality, data is often expensive, difficult to access, created for a single purpose, quickly changing and difficult to weave together. To aid and inform future data-dependent research initiatives, we’ve outlined the common barriers that community development faces when working with data and identified three ways to overcome them.

Common barriers include:

  • Data often comes at a hefty price. …
  • Data can come with restrictions and regulations. …
  • Data is built for a specific purpose, meaning information isn’t always in the same place. …
  • Data can actually be too big. ….
  • Data gaps exist. …
  • Data can be too old. ….

As you can tell, there can be many complications when it comes to working with data, but there is still great value to using and having it. We’ve found a few way to overcome these barriers when scoping a research project:

1) Prepare to have to move to “Plan B” when trying to get answers that aren’t readily available in the data. It is incredibly important to be able to react to unexpected data conditions and to use proxy datasets when necessary in order to efficiently answer the core research question.

2) Building a data budget for your work is also advisable, as you shouldn’t anticipate that public entities or private firms will give you free data (nor that community development partners will be able to share datasets used for previous studies).

3) Identifying partners—including local governments, brokers, and community development or CDFI partners—is crucial to collecting the information you’ll need….(More)