A data revolution is underway. Will NGOs miss the boat?


Opinion by Sophia Ayele at Oxfam: “The data revolution has arrived. ….The UN has even launched a Data Revolution Group (to ensure that the revolution penetrates into international development). The Group’s 2014 report suggests that harnessing the power of newly available data could ultimately lead to, “more empowered people, better policies, better decisions and greater participation and accountability, leading to better outcomes for people and the planet.”

But where do NGOs fit in?

NGOs are generating dozens (if not hundreds) of datasets every year. Over the last two decades, NGO have been collecting increasing amounts of research and evaluation data, largely driven by donor demands for more rigorous evaluations of programs. The quality and efficiency of data collection has also been enhanced by mobile data collection. However, a quick scan of UK development NGOs reveals that few, if any, are sharing the data that they collect. This means that NGOs are generating dozens (if not hundreds) of datasets every year that aren’t being fully exploited and analysed. Working on tight budgets, with limited capacity, it’s not surprising that NGOs often shy away from sharing data without a clear mandate.

But change is in the air. Several donors have begun requiring NGOs to publicise data and others appear to be moving in that direction. Last year, USAID launched its Open Data Policy which requires that grantees “submit any dataset created or collected with USAID funding…” Not only does USAID stipulate this requirement, it also hosts this data on its Development Data Library (DDL) and provides guidance on anonymisation to depositors. Similarly, Gates Foundation’s 2015 Open Access Policy stipulates that, “Data underlying published research results will be accessible and open immediately.” However, they are allowing a two-year transition period…..Here at Oxfam, we have been exploring ways to begin sharing research and evaluation data. We aren’t being required to do this – yet – but, we realise that the data that we collect is a public good with the potential to improve lives through more effective development programmes and to raise the voices of those with whom we work. Moreover, organizations like Oxfam can play a crucial role in highlighting issues facing women and other marginalized communities that aren’t always captured in national statistics. Sharing data is also good practice and would increase our transparency and accountability as an organization.

… the data that we collect is a public good with the potential to improve lives. However, Oxfam also bears a huge responsibility to protect the rights of the communities that we work with. This involves ensuring informed consent when gathering data, so that communities are fully aware that their data may be shared, and de-identifying data to a level where individuals and households cannot be easily identified.

As Oxfam has outlined in our, recently adopted, Responsible Data Policy,”Using data responsibly is not just an issue of technical security and encryption but also of safeguarding the rights of people to be counted and heard, ensuring their dignity, respect and privacy, enabling them to make an informed decision and protecting their right to not be put at risk… (More)”

The Art of Managing Complex Collaborations


Eric Knight, Joel Cutcher-Gershenfeld, and Barbara Mittleman at MIT Sloan Management Review: “It’s not easy for stakeholders with widely varying interests to collaborate effectively in a consortium. The experience of the Biomarkers Consortium offers five lessons on how to successfully navigate the challenges that arise….

Society’s biggest challenges are also its most complex. From shared economic growth to personalized medicine to global climate change, few of our most pressing problems are likely to have simple solutions. Perhaps the only way to make progress on these and other challenges is by bringing together the important stakeholders on a given issue to pursue common interests and resolve points of conflict.

However, it is not easy to assemble such groups or to keep them together. Many initiatives have stumbled and disbanded. The Biomarkers Consortium might have been one of them, but this consortium beat the odds, in large part due to the founding parties’ determination to make it work. Nine years after it was founded, this public-private partnership, which is managed by the Foundation for the National Institutes of Health and based in Bethesda, Maryland, is still working to advance the availability of biomarkers (biological indicators for disease states) as tools for drug development, including applications at the frontiers of personalized medicine.

The Biomarkers Consortium’s mandate — to bring together, in the group’s words, “the expertise and resources of various partners to rapidly identify, develop, and qualify potential high-impact biomarkers particularly to enable improvements in drug development, clinical care, and regulatory decision-making” — may look simple. However, the reality has been quite complex. The negotiations that led to the consortium’s formation in 2006 were complicated, and the subsequent balancing of common and competing interests remains challenging….

Many in the biomedical sector had seen the need to tackle drug discovery costs for a long time, with multiple companies concurrently spending millions, sometimes billions, of dollars only to hit common dead ends in the drug development process. In 2004 and 2005, then National Institutes of Health director Elias Zerhouni convened key people from the U.S. Food and Drug Administration, the NIH, and the Pharmaceutical Research and Manufacturers of America to create a multistakeholder forum.

Every member knew from the outset that their fellow stakeholders represented many divergent and sometimes opposing interests: large pharmaceutical companies, smaller entrepreneurial biotechnology companies, FDA regulators, NIH science and policy experts, university researchers and nonprofit patient advocacy organizations….(More)”

Beyond the Jailhouse Cell: How Data Can Inform Fairer Justice Policies


Alexis Farmer at DataDrivenDetroit: “Government-provided open data is a value-added approach to providing transparency, analytic insights for government efficiency, innovative solutions for products and services, and increased civic participation. Two of the least transparent public institutions are jails and prisons. The majority of population has limited knowledge about jail and prison operations and the demographics of the jail and prison population, even though the costs of incarceration are substantial. The absence of public knowledge about one of the many establishments public tax dollars support can be resolved with an open data approach to criminal justice. Increasing access to administrative jail information enables communities to collectively and effectively find solutions to the challenges the system faces….

The data analysis that compliments open data practices is a part of the formula for creating transformational policies. There are numerous ways that recording and publishing data about jail operations can inform better policies and practices:

1. Better budgeting and allocation of funds. By monitoring the rate at which dollars are expended for a specific function, data allows for administrations to ensure accurate estimates of future expenditures.

2. More effective deployment of staff. Knowing the average daily population and annual average bookings can help inform staffing decisions to determine a total need of officers, shift responsibilities, and room arrangements. The population information also helps with facility planning, reducing overcrowding, controlling violence within the facility, staffing, determining appropriate programs and services, and policy and procedure development.

3. Program participation and effectiveness. Gauging the amount of inmates involved in jail work programs, educational training services, rehabilitation/detox programs, and the like is critical to evaluating methods to improve and expand such services. Quantifying participation and effectiveness of these programs can potentially lead to a shift in jail rehabilitating services.

4. Jail suicides. “The rate of jail suicides is about three times the rate of prison suicides.” Jails are isolating spaces that separate inmates from social support networks, diminish personal control, and often lack mental health resources. Most people in jail face minor charges and spend less time incarcerated due to shorter sentences. Reviewing the previous jail suicide statistics aids in pinpointing suicide risk, identifying high-risk groups, and ultimately, prescribing intervention procedures and best practices to end jail suicides.

5. Gender and race inequities. It is well known that Black men are disproportionately incarcerated, and the number of Black women in jails and prisons has rapidly increased . It is important to view this disparity as it reflects to the demographics of the total population of an area. Providing data that show trends in particular crimes committed by race and gender data might lead to further analysis and policy changes in the root causes of these crimes (poverty, employment, education, housing, etc.).

6. Prior interaction with the juvenile justice system. The school-to-prison pipeline describes the systematic school discipline policies that increase a student’s interaction with the juvenile justice system. Knowing how many incarcerated persons that have been suspended, expelled, or incarcerated as a juvenile can encourage schools to examine their discipline policies and institute more restorative justice programs for students. It would also encourage transitional programs for formerly incarcerated youth in order to decrease recidivism rate among young people.

7. Sentencing reforms. Evaluating the charges on which a person is arrested, the length of stay, average length of sentences, charges for which sentences are given, and the length of time from the first appearance to arraignment and trial disposition can inform more just and balanced sentencing laws enforced by the judicial branch….(More)”

Open Data: A 21st Century Asset for Small and Medium Sized Enterprises


“The economic and social potential of open data is widely acknowledged. In particular, the business opportunities have received much attention. But for all the excitement, we still know very little about how and under what conditions open data really works.

To broaden our understanding of the use and impact of open data, the GovLab has a variety of initiatives and studies underway. Today, we share publicly our findings on how Small and Medium Sized Enterprises (SMEs) are leveraging open data for a variety of purposes. Our paper “Open Data: A 21st Century Asset for Small and Medium Sized Enterprises” seeks to build a portrait of the lifecycle of open data—how it is collected, stored and used. It outlines some of the most important parameters of an open data business model for SMEs….

The paper analyzes ten aspects of open data and establishes ten principles for its effective use by SMEs. Taken together, these offer a roadmap for any SME considering greater use or adoption of open data in its business.

Among the key findings included in the paper:

  • SMEs, which often lack access to data or sophisticated analytical tools to process large datasets, are likely to be one of the chief beneficiaries of open data.
  • Government data is the main category of open data being used by SMEs. A number of SMEs are also using open scientific and shared corporate data.
  • Open data is used primarily to serve the Business-to-Business (B2B) markets, followed by the Business-to-Consumer (B2C) markets. A number of the companies studied serve two or three market segments simultaneously.
  • Open data is usually a free resource, but SMEs are monetizing their open-data-driven services to build viable businesses. The most common revenue models include subscription-based services, advertising, fees for products and services, freemium models, licensing fees, lead generation and philanthropic grants.
  • The most significant challenges SMEs face in using open data include those concerning data quality and consistency, insufficient financial and human resources, and issues surrounding privacy.

This is just a sampling of findings and observations. The paper includes a number of additional observations concerning business and revenue models, product development, customer acquisition, and other subjects of relevance to any company considering an open data strategy.”

5 Tips for Designing a Data for Good Initiative


Mitul Desai at Mastercard Center for Inclusive Growth: “The transformative impact of data on development projects, captured in the hashtag #DATARevolution, offers the social and private sectors alike a rallying point to enlist data in the service of high-impact development initiatives.

To help organizations design initiatives that are authentic to their identity and capabilities, we’re sharing what’s necessary to navigate the deeply interconnected organizational, technical and ethical aspects of creating a Data for Good initiative.

1) Define the need

At the center of a Data for Good initiative are the individual beneficiaries you are seeking to serve. This is foundation on which the “Good” of Data for Good rests.

Understanding the data and expertise needed to better serve such individuals will bring into focus the areas where your organization can contribute and the partners you might engage. As we’ve covered in past posts, collaboration between agents who bring different layers of expertise to Data for Good projects is a powerful formula for change….

2) Understand what data can make a difference

Think about what kind of data can tell a story that’s relevant to your mission. Claudia Perlich of Dstillery says: “The question is first and foremost, what decision do I have to make and which data can tell me something about that decision.” This great introduction to what different kinds of data are relevant in different settings can give you concrete examples.

3) Get the right tools for the job

By one estimate, some 90% of business-relevant data are unstructured or semi-structured (think texts, tweets, images, audio) as opposed to structured data like numbers that easily fit into the lines of a spreadsheet. Perlich notes that while it’s more challenging to mine this unstructured data, they can yield especially powerful insights with the right tools—which thankfully aren’t that hard to identify…..

4) Build a case that moves your organization

“While our programs are designed to serve organizations no matter what their capacity, we do find that an organization’s clarity around mission and commitment to using data to drive decision-making are two factors that can make or break a project,” says Jake Porway, founder and executive director of DataKind, a New York-based data science nonprofit that helps organizations develop Data for Good initiatives…..

5) Make technology serve people-centric ethics

The two most critical ethical factors to consider are informed consent and privacy—both require engaging the community you wish to serve as individual actors….

“Employ data-privacy walls, mask the data from the point of collection and encrypt the data you store. Ensure that appropriate technical and organizational safeguards are in place to verify that the data can’t be used to identify individuals or target demographics in a way that could harm them,” recommends Quid’s Pedraza. To understand the technology of data encryption and masking, check out this post. (More)”

How understanding the ‘shape’ of data could change our world


Gurjeet Singh at the WEF: “We live in an extraordinary time. The capacity to generate and to store data has reached dizzying proportions. What lies within that data represents the chance for this generation to solve its most pressing problems – from disease and climate change, to healthcare and customer understanding.

The magnitude of the opportunity is defined by the magnitude of the data that is created – and it is astonishing….

Despite the technical advances in collection and storage, knowledge generation lags. This is a function of how organizations approach their data, how they conduct analyses, how they automate learning through machine intelligence.

At its heart, it is a mathematical problem. For any dataset the total number of possible hypotheses/queries is exponential in the size of the data. Exponential functions are difficult enough for humans to comprehend; however, to further complicate matters, the size of the data itself is growing exponentially, and is about to hit another inflection point as the Internet of Things kicks in.


What that means is that we are facing double exponential growth in the number of questions that we can ask of our data. If we choose the same approaches that have served us over time – iteratively asking questions of the data until we get the right answer – we will have lost out on opportunity to grasp our generational opportunity.

There are not, and will not ever be enough data scientists in the world to be successful in this approach. We cannot arm enough citizen data scientists with new software to be successful in this approach. Software that makes question asking or hypothesis development more accessible or more efficient miss the central premise that they will only fall further behind as new data becomes available each millisecond.

To truly unlock the value that lies within our data we need to turn our attention to the data, setting aside the questions for later. This too, turns out to be a mathematical problem. Data, it turns out, has shape. That shape has meaning. The shape of data tells you everything you need to know about your data from its obvious features to its secret secrets.

We understand that regression produces lines.

data2

We know that customer segmentation produces groups.

data3

We know that economic growth and interest rates have a cyclical nature (diseases like malaria have this shape too).

data4

By knowing the shape and where we are in the shape, we vastly improve our understanding of where we are, where we have been and perhaps more importantly, what might happen next. In understanding the shape of data we understand every feature of the dataset, immediately grasping what it is important in the data, thus dramatically reducing the number of questions to ask and accelerating the discovery process.

By changing our thinking – and starting with the shape of the data, not a series of questions (which very often come with significant biases) – we can extract knowledge from these rapidly growing, massive and complex datasets.

The knowledge that lies hidden within electronic medical records, billing records and clinical records is enough to transform how we deliver healthcare and how we treat diseases. The knowledge that lies within the massive data stores of governments, universities and other institutions will illuminate the conversation on climate change and point the way to answers on what we need to do to protect the planet for future generations. The knowledge that is obscured by web, transaction, CRM, social and other data will inform a clearer, more meaningful picture of the customer and will, in turn define the optimal way to interact.

This is the opportunity for our generation to turn data into knowledge. To get there will require a different approach, but one with the ability to impact the entirety of humankind….(More)

IMF Publishes Worldwide Government Revenue Database


IMF Press Release: “The IMF today published for the first time the World Revenue Longitudinal Dataset (WoRLD), which provides data on tax and non-tax revenues for 186 countries over the period 1990-2013. The database includes broad country coverage and time periods, and it is the result of combining in a consistent manner data from two other IMF publications — the IMF Government Finance Statistics and World Economic Outlook (WEO)– and drawing on the OECD’s Revenue Statistics and Revenue Statistics in Latin America and the Caribbean.

Vitor Gaspar, Director of the IMF’s Fiscal Affairs Department, said the purpose of releasing the database for general use is to “encourage and facilitate informed discussion and analysis of tax policy and administration for the full range of countries, the need for which was highlighted most recently during the Financing for Development conference in Addis Ababa.”

Constructing the database was a challenging exercise. An accompanying background note will be released in the coming weeks to explain the methodology. The database will be updated annually and will include information from IMF staff reports.

The database is available for download free of charge on the IMF e-Library data portal (http://data.imf.org/revenues).”

 

The Data Divide: What We Want and What We Can Get


Craig Adelman and Erin Austin at Living Cities (Read Blog 1):There is no shortage of data. At every level–federal, state, county, city and even within our own organizations–we are collecting and trying to make use of data. Data is a catch-all term that suggests universal access and easy use. The problem? In reality, data is often expensive, difficult to access, created for a single purpose, quickly changing and difficult to weave together. To aid and inform future data-dependent research initiatives, we’ve outlined the common barriers that community development faces when working with data and identified three ways to overcome them.

Common barriers include:

  • Data often comes at a hefty price. …
  • Data can come with restrictions and regulations. …
  • Data is built for a specific purpose, meaning information isn’t always in the same place. …
  • Data can actually be too big. ….
  • Data gaps exist. …
  • Data can be too old. ….

As you can tell, there can be many complications when it comes to working with data, but there is still great value to using and having it. We’ve found a few way to overcome these barriers when scoping a research project:

1) Prepare to have to move to “Plan B” when trying to get answers that aren’t readily available in the data. It is incredibly important to be able to react to unexpected data conditions and to use proxy datasets when necessary in order to efficiently answer the core research question.

2) Building a data budget for your work is also advisable, as you shouldn’t anticipate that public entities or private firms will give you free data (nor that community development partners will be able to share datasets used for previous studies).

3) Identifying partners—including local governments, brokers, and community development or CDFI partners—is crucial to collecting the information you’ll need….(More)

How can we ensure that cities create opportunities for healthy urbanization?


Blog by Roy Ahn, Thomas F. Burke & Anita M. McGahan on their new book: “By the year 2100, 8 out of 10 people in the world will reside in cities – a major change in demographics compared to 100 years ago.

Urbanization has sweeping consequences for population health. Most analysts evaluate the “specter of urbanization” by focusing on problems and challenges, which can include slum development, insecurity, and inequality.

As the World Health Organization and UN Habitat note in their seminal report, Hidden Cities, “Cities concentrate opportunities, jobs and services, but they also concentrate risks and hazards for health.” The urban poor are especially vulnerable because their housing conditions and access to clean water, sanitation, and health care are often severely compromised.

Additionally, the jobs available to the urban poor are often informal, dangerous, and temporary. Yet the lack of integrated governance and infrastructure responsible for urbanization problems also can create remarkable and often untapped opportunities for improving health. How can we ensure that cities create opportunities for healthy urbanization?

In our new book, Innovating for Healthy Urbanization, we argue that using the “innovations” lens can provide a unique platform through which solutions for urbanization and health can emerge.

Sometimes “innovations” can be decidedly high tech, such as holograms on medication packaging that protect against drug counterfeiters, or tiny filter paper tests costing pennies that exponentially increase access to medical diagnostic testing for poor people living in cities.

Other innovations are less tech-focused, but equally impactful, such as advocating for motorcycle helmet laws in cities or a low-cost, condom catheter-balloon kit that can save mothers from dying from postpartum hemorrhage.

What makes both high- and low-tech solutions effective? Pushing the envelope on what works and then integrating solutions to meet a community’s priority needs…..(More)”

Disrupting development with digital technologies


Kemal Derviş at Brookings: “The emergence of a new digital economy is changing the ways in which businesses and development organizations engage in emerging and developing countries. Transaction costs have been radically driven down, enabling greater inclusion. And technology is driving efficiency improvements, and permitting rapid scaling-up and transformational change.

Three trends in particular have the potential to redefine how global development occurs and how efforts will support it over the next 10 years: (1) the growing adoption of digital payments serving people everywhere with near-frictionless transactions; (2) the spread of Internet connectivity and digital literacy; and (3) the harnessing of data to better serve the poor and to generate new knowledge….. Brookings commissioned six essays …present some of the most current information and thinking on what might be termed “digital disruption,” we are making them publicly available to stimulate wider discussion. The six essays and their authors are: