Five principles for applying data science for social good


Jake Porway at O’Reilly: “….Every week, a data or technology company declares that it wants to “do good” and there are countless workshops hosted by major foundations musing on what “big data can do for society.” Add to that a growing number of data-for-good programs from Data Science for Social Good’s fantastic summer program toBayes Impact’s data science fellowships to DrivenData’s data-science-for-good competitions, and you can see how quickly this idea of “data for good” is growing.

Yes, it’s an exciting time to be exploring the ways new datasets, new techniques, and new scientists could be deployed to “make the world a better place.” We’ve already seen deep learning applied to ocean health,satellite imagery used to estimate poverty levels, and cellphone data used to elucidate Nairobi’s hidden public transportation routes. And yet, for all this excitement about the potential of this “data for good movement,” we are still desperately far from creating lasting impact. Many efforts will not only fall short of lasting impact — they will make no change at all….

So how can these well-intentioned efforts reach their full potential for real impact? Embracing the following five principles can drastically accelerate a world in which we truly use data to serve humanity.

1. “Statistics” is so much more than “percentages”

We must convey what constitutes data, what it can be used for, and why it’s valuable.

There was a packed house for the March 2015 release of the No Ceilings Full Participation Report. Hillary Clinton, Melinda Gates, and Chelsea Clinton stood on stage and lauded the report, the culmination of a year-long effort to aggregate and analyze new and existing global data, as the biggest, most comprehensive data collection effort about women and gender ever attempted. One of the most trumpeted parts of the effort was the release of the data in an open and easily accessible way.

I ran home and excitedly pulled up the data from the No Ceilings GitHub, giddy to use it for our DataKind projects. As I downloaded each file, my heart sunk. The 6MB size of the entire global dataset told me what I would find inside before I even opened the first file. Like a familiar ache, the first row of the spreadsheet said it all: “USA, 2009, 84.4%.”

What I’d encountered was a common situation when it comes to data in the social sector: the prevalence of inert, aggregate data. ….

2. Finding problems can be harder than finding solutions

We must scale the process of problem discovery through deeper collaboration between the problem holders, the data holders, and the skills holders.

In the immortal words of Henry Ford, “If I’d asked people what they wanted, they would have said a faster horse.” Right now, the field of data science is in a similar position. Framing data solutions for organizations that don’t realize how much is now possible can be a frustrating search for faster horses. If data cleaning is 80% of the hard work in data science, then problem discovery makes up nearly the remaining 20% when doing data science for good.

The plague here is one of education. …

3. Communication is more important than technology

We must foster environments in which people can speak openly, honestly, and without judgment. We must be constantly curious about each other.

At the conclusion of one of our recent DataKind events, one of our partner nonprofit organizations lined up to hear the results from their volunteer team of data scientists. Everyone was all smiles — the nonprofit leaders had loved the project experience, the data scientists were excited with their results. The presentations began. “We used Amazon RedShift to store the data, which allowed us to quickly build a multinomial regression. The p-value of 0.002 shows …” Eyes glazed over. The nonprofit leaders furrowed their brows in telegraphed concentration. The jargon was standing in the way of understanding the true utility of the project’s findings. It was clear that, like so many other well-intentioned efforts, the project was at risk of gathering dust on a shelf if the team of volunteers couldn’t help the organization understand what they had learned and how it could be integrated into the organization’s ongoing work…..

4. We need diverse viewpoints

To tackle sector-wide challenges, we need a range of voices involved.

One of the most challenging aspects to making change at the sector level is the range of diverse viewpoints necessary to understand a problem in its entirety. In the business world, profit, revenue, or output can be valid metrics of success. Rarely, if ever, are metrics for social change so cleanly defined….

Challenging this paradigm requires diverse, or “collective impact,” approaches to problem solving. The idea has been around for a while (h/t Chris Diehl), but has not yet been widely implemented due to the challenges in successful collective impact. Moreover, while there are many diverse collectives committed to social change, few have the voice of expert data scientists involved. DataKind is piloting a collective impact model called DataKind Labs, that seeks to bring together diverse problem holders, data holders, and data science experts to co-create solutions that can be applied across an entire sector-wide challenge. We just launchedour first project with Microsoft to increase traffic safety and are hopeful that this effort will demonstrate how vital a role data science can play in a collective impact approach.

5. We must design for people

Data is not truth, and tech is not an answer in-and-of-itself. Without designing for the humans on the other end, our work is in vain.

So many of the data projects making headlines — a new app for finding public services, a new probabilistic model for predicting weather patterns for subsistence farmers, a visualization of government spending — are great and interesting accomplishments, but don’t seem to have an end user in mind. The current approach appears to be “get the tech geeks to hack on this problem, and we’ll have cool new solutions!” I’ve opined that, though there are many benefits to hackathons, you can’t just hack your way to social change….(More)”