Big Risks, Big Opportunities: the Intersection of Big Data and Civil Rights


Latest White House report on Big Data charts pathways for fairness and opportunity but also cautions against re-encoding bias and discrimination into algorithmic systems: ” Advertisements tailored to reflect previous purchasing decisions; targeted job postings based on your degree and social networks; reams of data informing predictions around college admissions and financial aid. Need a loan? There’s an app for that.

As technology advances and our economic, social, and civic lives become increasingly digital, we are faced with ethical questions of great consequence. Big data and associated technologies create enormous new opportunities to revisit assumptions and instead make data-driven decisions. Properly harnessed, big data can be a tool for overcoming longstanding bias and rooting out discrimination.

The era of big data is also full of risk. The algorithmic systems that turn data into information are not infallible—they rely on the imperfect inputs, logic, probability, and people who design them. Predictors of success can become barriers to entry; careful marketing can be rooted in stereotype. Without deliberate care, these innovations can easily hardwire discrimination, reinforce bias, and mask opportunity.

Because technological innovation presents both great opportunity and great risk, the White House has released several reports on “big data” intended to prompt conversation and advance these important issues. The topics of previous reports on data analytics included privacy, prices in the marketplace, and consumer protection laws. Today, we are announcing the latest report on big data, one centered on algorithmic systems, opportunity, and civil rights.

The first big data report warned of “the potential of encoding discrimination in automated decisions”—that is, discrimination may “be the inadvertent outcome of the way big data technologies are structured and used.” A commitment to understanding these risks and harnessing technology for good prompted us to specifically examine the intersection between big data and civil rights.

Using case studies on credit lending, employment, higher education, and criminal justice, the report we are releasing today illustrates how big data techniques can be used to detect bias and prevent discrimination. It also demonstrates the risks involved, particularly how technologies can deliberately or inadvertently perpetuate, exacerbate, or mask discrimination.

The purpose of the report is not to offer remedies to the issues it raises, but rather to identify these issues and prompt conversation, research—and action—among technologists, academics, policy makers, and citizens, alike.

The report includes a number of recommendations for advancing work in this nascent field of data and ethics. These include investing in research, broadening and diversifying technical leadership, cross-training, and expanded literacy on data discrimination, bolstering accountability, and creating standards for use within both the government and the private sector. It also calls on computer and data science programs and professionals to promote fairness and opportunity as part of an overall commitment to the responsible and ethical use of data.

Big data is here to stay; the question is how it will be used: to advance civil rights and opportunity, or to undermine them….(More)”

Citizen scientists aid Ecuador earthquake relief


Mark Zastrow at Nature: “After a magnitude-7.8 earthquake struck Ecuador’s Pacific coast on 16 April, a new ally joined the international relief effort: a citizen-science network called Zooniverse.

On 25 April, Zooniverse launched a website that asks volunteers to analyse rapidly-snapped satellite imagery of the disaster, which led to more than 650 reported deaths and 16,000 injuries. The aim is to help relief workers on the ground to find the most heavily damaged regions and identify which roads are passable.

Several crisis-mapping programmes with thousands of volunteers already exist — but it can take days to train satellites on the damaged region and to transmit data to humanitarian organizations, and results have not always proven useful. The Ecuador quake marked the first live public test for an effort dubbed the Planetary Response Network (PRN), which promises to be both more nimble than previous efforts, and to use more rigorous machine-learning algorithms to evaluate the quality of crowd-sourced analyses.

The network relies on imagery from the satellite company Planet Labs in San Francisco, California, which uses an array of shoebox-sized satellites to map the planet. In order to speed up the crowd-sourced process, it uses the Zooniverse platform to distribute the tasks of spotting features in satellite images. Machine-learning algorithms employed by a team at the University of Oxford, UK, then classify the reliability of each volunteer’s analysis and weight their contributions accordingly.

Rapid-fire data

Within 2 hours of the Ecuador test project going live with a first set of 1,300 images, each photo had been checked at least 20 times. “It was one of the fastest responses I’ve seen,” says Brooke Simmons, an astronomer at the University of California, San Diego, who leads the image processing. Steven Reece, who heads the Oxford team’s machine-learning effort, says that results — a “heat map” of damage with possible road blockages — were ready in another two hours.

In all, more than 2,800 Zooniverse users contributed to analysing roughly 25,000 square kilometres of imagery centred around the coastal cities of Pedernales and Bahia de Caraquez. That is where the London-based relief organization Rescue Global — which requested the analysis the day after the earthquake — currently has relief teams on the ground, including search dogs and medical units….(More)”

Using Data to Help People in Distress Get Help Faster


Nicole Wallace in The Chronicle of Philanthropy: “Answering text messages to a crisis hotline is different from handling customer-service calls: You don’t want counselors to answer folks in the order their messages were received. You want them to take the people in greatest distress first.

Crisis Text Line, a charity that provides counseling by text message, uses sophisticated data analysis to predict how serious the conversations are likely to be and ranks them by severity. Using an algorithm to automate triage ensures that people in crisis get help fast — with an unexpected side benefit for other texters contacting the hotline: shorter wait times.

When the nonprofit started in 2013, deciding which messages to take first was much more old-school. Counselors had to read all the messages in the queue and make a gut-level decision on which person was most in need of help.

“It was slow,” says Bob Filbin, the organization’s chief data scientist.

To solve the problem, Mr. Filbin and his colleagues used past messages to the hotline to create an algorithm that analyzes the language used in incoming messages and ranks them in order of predicted severity.

And it’s working. Since the algorithm went live on the platform, messages it marked as severe — code orange — led to conversations that were six times more likely to include thoughts of suicide or self-harm than exchanges started by other texts that weren’t marked code orange, and nine times more likely to have resulted in the counselor contacting emergency services to intervene in a suicide attempt.

Counselors don’t even see the queue of waiting texts anymore. They just click a button marked “Help Another Texter,” and the system connects them to the person whose message has been marked most urgent….(More)”

E-Government Strategy, ICT and Innovation for Citizen Engagement


Brief by Dennis Anderson, Robert Wu, Dr. June-Suh Cho, and Katja Schroeder: “This book discusses three levels of e-government and national strategies to reach a citizen-centric participatory e-government, and examines how disruptive technologies help shape the future of e-government. The authors examine how e-government can facilitate a symbiotic relationship between the government and its citizens. ICTs aid this relationship and promote transparencies so that citizens can place greater trust in the activities of their government. If a government can manage resources more effectively by better understanding the needs of its citizens, it can create a sustainable environment for citizens. Having a national strategy on ICT in government and e-government can significantly reduce government waste, corruption, and inefficiency. Businesses, CIOs and CTOs in the public sector interested in meeting sustainability requirements will find this book useful. …(More)”

Foundation Openness: A Critical Component of Foundation Effectiveness


Lindsay Louie at PhilanthroFiles: “We created the Fund for Shared Insight—a funder collaborative with diverse support from 30 different funders—to increase foundation openness. We believe that if foundations are more open—which we define as how they share about their goals and strategies; make decisions and measure progress; listen and engage in dialogue with others; act on what they hear; and share what they themselves have learned—they will be more effective.

WPhilanthropy Lessonse were so pleased to support Exponent Philanthropy’s video series featuring philanthropists being more open about their work: Philanthropy Lessons. To date, Exponent Philanthropy has released 5 of the total 9 videos, including:

Future video releases include:

  • Who Knows More? (expected 4/27/16)
  • Being Transparent (expected 4/27/16)
  • Value Beyond Dollars (expected 5/25/16)
  • Getting Out of the Office (expected 6/22/16)

We would love to see many more foundations make videos like these; engage in conversation with each other about these philanthropy lessons online and in person; share their experiences live at regional grantmaker association meetings or a national conferences like those Exponent Philanthropy hosts; and find other ways to be more open.

Why is this so important?

Recent research from the Center for Effective Philanthropy (report on CEP’s website here, full disclosure we funded this research) found that foundation CEOs see grantees, nonprofits that are considering applying for a grant, and other foundations working on similar issues as the top three audiences who benefit from a foundation being open about its work. Further, 86% of foundation CEOs who responded to the survey said they believe transparency is necessary for building strong relationships with grantees.

It was great to learn from this research that many foundations are open about their criteria for nonprofits seeking funding, their programmatic goals, and their strategies; and share about who makes decisions about the grantee selection process. Yet the research also found that foundations are not as open about sharing what they are achieving, how they assess their work, and their experiences with what has and hasn’t worked—and that foundation CEOs believe it would be beneficial for foundations to share more in these specific areas….(More)”

What Should We Do About Big Data Leaks?


Paul Ford at the New Republic: “I have a great fondness for government data, and the government has a great fondness for making more of it. Federal elections financial data, for example, with every contribution identified, connected to a name and address. Or the results of the census. I don’t know if you’ve ever had the experience of downloading census data but it’s pretty exciting. You can hold America on your hard drive! Meditate on the miracles of zip codes, the way the country is held together and addressable by arbitrary sets of digits.

You can download whole books, in PDF format, about the foreign policy of the Reagan Administration as it related to Russia. Negotiations over which door the Soviet ambassador would use to enter a building. Gigabytes and gigabytes of pure joy for the ephemeralist. The government is the greatest creator of ephemera ever.

Consider the Financial Crisis Inquiry Commission, or FCIC, created in 2009 to figure out exactly how the global economic pooch was screwed. The FCIC has made so much data, and has done an admirable job (caveats noted below) of arranging it. So much stuff. There are reams of treasure on a single FCIC web site, hosted at Stanford Law School: Hundreds of MP3 files, for example, with interviews with Jamie Dimonof JPMorgan Chase and Lloyd Blankfein of Goldman Sachs. I am desperate to find  time to write some code that automatically extracts random audio snippets from each and puts them on top of a slow ambient drone with plenty of reverb, so that I can relax to the dulcet tones of the financial industry explaining away its failings. (There’s a Paul Krugman interview that I assume is more critical.)

The recordings are just the beginning. They’ve released so many documents, and with the documents, a finding aid that you can download in handy PDF format, which will tell you where to, well, find things, pointing to thousands of documents. That aid alone is 1,439 pages.

Look, it is excellent that this exists, in public, on the web. But it also presents a very contemporary problem: What is transparency in the age of massive database drops? The data is available, but locked in MP3s and PDFs and other documents; it’s not searchable in the way a web page is searchable, not easy to comment on or share.

Consider the WikiLeaks release of State Department cables. They were exhausting, there were so many of them, they were in all caps. Or the trove of data Edward Snowden gathered on aUSB drive, or Chelsea Manning on CD. And the Ashley Madison leak, spread across database files and logs of credit card receipts. The massive and sprawling Sony leak, complete with whole email inboxes. And with the just-released Panama Papers, we see two exciting new developments: First, the consortium of media organizations that managed the leak actually came together and collectively, well, branded the papers, down to a hashtag (#panamapapers), informational website, etc. Second, the size of the leak itself—2.5 terabytes!—become a talking point, even though that exact description of what was contained within those terabytes was harder to understand. This, said the consortia of journalists that notably did not include The New York Times, The Washington Post, etc., is the big one. Stay tuned. And we are. But the fact remains: These artifacts are not accessible to any but the most assiduous amateur conspiracist; they’re the domain of professionals with the time and money to deal with them. Who else could be bothered?

If you watched the movie Spotlight, you saw journalists at work, pawing through reams of documents, going through, essentially, phone books. I am an inveterate downloader of such things. I love what they represent. And I’m also comfortable with many-gigabyte corpora spread across web sites. I know how to fetch data, how to consolidate it, and how to search it. I share this skill set with many data journalists, and these capacities have, in some ways, become the sole province of the media. Organs of journalism are among the only remaining cultural institutions that can fund investigations of this size and tease the data apart, identifying linkages and thus constructing informational webs that can, with great effort, be turned into narratives, yielding something like what we call “a story” or “the truth.” 

Spotlight was set around 2001, and it features a lot of people looking at things on paper. The problem has changed greatly since then: The data is everywhere. The media has been forced into a new cultural role, that of the arbiter of the giant and semi-legal database. ProPublica, a nonprofit that does a great deal of data gathering and data journalism and then shares its findings with other media outlets, is one example; it funded a project called DocumentCloud with other media organizations that simplifies the process of searching through giant piles of PDFs (e.g., court records, or the results of Freedom of Information Act requests).

At some level the sheer boredom and drudgery of managing these large data leaks make them immune to casual interest; even the Ashley Madison leak, which I downloaded, was basically an opaque pile of data and really quite boring unless you had some motive to poke around.

If this is the age of the citizen journalist, or at least the citizen opinion columnist, it’s also the age of the data journalist, with the news media acting as product managers of data leaks, making the information usable, browsable, attractive. There is an uneasy partnership between leakers and the media, just as there is an uneasy partnership between the press and the government, which would like some credit for its efforts, thank you very much, and wouldn’t mind if you gave it some points for transparency while you’re at it.

Pause for a second. There’s a glut of data, but most of it comes to us in ugly formats. What would happen if the things released in the interest of transparency were released in actual transparent formats?…(More)”

Can Data Literacy Protect Us from Misleading Political Ads?


Walter Frick at Harvard Business Review: “It’s campaign season in the U.S., and politicians have no compunction about twisting facts and figures, as a quick skim of the fact-checking website Politifact illustrates.

Can data literacy guard against the worst of these offenses? Maybe, according to research.

There is substantial evidence that numeracy can aid critical thinking, and some reason to think it can help in the political realm, within limits. But there is also evidence that numbers can mislead even data-savvy people when it’s in service of those people’s politics.

In a study published at the end of last year, Vittorio Merola of Ohio State University and Matthew Hitt of Louisiana State examined how numeracy might guard against partisan messaging. They showed participants information comparing the costs of probation and prison, and then asked whether participants agreed with the statement, “Probation should be used as an alternative form of punishment, instead of prison, for felons.”

Some of the participants were shown highly relevant numeric information arguing for the benefits of probation: that it costs less and has a better cost-benefit ratio, and that the cost of U.S. prisons has been rising. Another group was shown weaker, less-relevant numeric information. This message didn’t contain anything about the costs or benefits of parole, and instead compared prison costs to transportation spending, with no mention of why these might be at all related. The experiment also varied whether the information was supposedly from a study commissioned by Democrats or Republicans.

The researchers scored participants’ numeracy by asking questions like, “The chance of getting a viral infection is 0.0005. Out of 10,000 people, about how
many of them are expected to get infected?”

For participants who scored low in numeracy, their support depended more on the political party making the argument than on the strength of the data. When the information came from those participants’ own party, they were more likely to agree with it, no matter whether it was weak or strong.

By contrast, participants who scored higher in numeracy were persuaded by the stronger numeric information, even when it came from the other party. The results held up even after accounting for participants’ education, among other variables….

In 2013, Dan Kahan of Yale and several colleagues conducted a study in which they asked participants to draw conclusions from data. In one group, the data was about a treatment for skin rashes, a nonpolitical topic. Another group was asked to evaluate data on gun control, comparing crime rates for cities that have banned concealed weapons to cities that haven’t.

Additionally, in the skin rash group some participants were shown data indicating that the use of skin cream correlated with rashes getting better, while some were shown the opposite. Similarly, some in the gun control group were shown less crime in cities that have banned concealed weapons, while some were shown the reverse…. They found that highly numerate people did better than less-numerate ones in drawing the correct inference in the skin rash case. But comfort with numbers didn’t seem to help when it came to gun control. In fact, highly numerate participants were more polarized over the gun control data than less-numerate ones. The reason seemed to be that the numerate participants used their skill with data selectively, employing it only when doing so helped them reach a conclusion that fit with their political ideology.

Two other lines of research are relevant here.

First, work by Philip Tetlock and Barbara Mellers of the University of Pennsylvania suggests that numerate people tend to make better forecasts, including about geopolitical events. They’ve also documented that even very basic training in probabilistic thinking can improve one’s forecasting accuracy. And this approach works best, Tetlock argues, when it’s part of a whole style of thinking that emphasizes multiple points of view.

Second, two papers, one from the University of Texas at Austin and one from Princeton, found that partisan bias can be diminished with incentives: People are more likely to report factually correct beliefs about the economy when money is on the line…..(More)”

Social app for refugees and locals translates in real-time


Springwise: “Europe is in the middle of a major refugee crisis, with more than one million migrants arriving in 2015 alone. Now, developers in Stockholm are coming up with new ways for arrivals to integrate into their new homes.

Welcome! is an app based in Sweden, a country that has operated a broadly open policy to immigration in recent years. The developers say the app aims to break down social and language barriers between Swedes and refugees. Welcome! is translated into Arabic, Persian, Swedish and English, and it enables users to create, host and join activities, as well as ask questions of locals, chat with new contacts, and browse events that are nearby.

The idea is to solve one of the major difficulties for immigrants arriving in Europe by encouraging the new arrivals and locals to interact and connect, helping the refugees to settle in. The app offers real-time auto-translation through its four languages, and can be downloaded for iOS and Android….We have already seen an initiative in Finland helping to set up startups with refugees…(More)

Technology for Transparency: Cases from Sub-Saharan Africa


 at Havard Political Review: “Over the last decade, Africa has experienced previously unseen levels of economic growth and market vibrancy. Developing countries can only achieve equitable growth and reduce poverty rates, however, if they are able to make the most of their available resources. To do this, they must maximize the impact of aid from donor governments and NGOs and ensure that domestic markets continue to diversify, add jobs, and generate tax revenues. Yet, in most developing countries, there is a dearth of information available about industry profits, government spending, and policy outcomes that prevents efficient action.

ONE, an international advocacy organization, has estimated that $68.6 billion was lost in sub-Saharan Africa in 2012 due to a lack of transparency in government budgeting….

The Importance of Technology

Increased visibility of problems exerts pressure on politicians and other public sector actors to adjust their actions. This process is known as social monitoring, and it relies on citizens or public agencies using digital tools, such as mobile phones, Facebook, and other social media sites to spot public problems. In sub-Saharan Africa, however, traditional media companies and governments have not shown consistency in reporting on transparency issues.

New technologies offer a solution to this problem. Philip Thigo, the creator of an online and SMS platform that monitors government spending, said in an interview with Technology for Transparency, “All we are trying to do is enhance the work that [governments] do. We thought that if we could create a clear channel where communities could actually access data, then the work of government would be easier.” Networked citizen media platforms that rely on the volunteer contributions of citizens have become increasingly popular. Given that in most African countries less than 10 percent of the population has Internet access, mobile-device-based programs have proven the logical solution. About 30 percent of the population continent-wide has access to cell phones.

Lova Rakotomalala, a co-founder of an NGO in Madagascar that promotes online exposure of social grassroots projects, told the HPR, “most Malagasies will have a mobile phone and an FM radio because it helps them in their daily lives.” Rakotomalala works to provide workshops and IT training to people in regions of Madagascar where Internet access has been recently introduced. According to him, “the amount of data that we can collect from social monitoring and transparency projects will only grow in the near future. There is much room for improvement.”

Kenyan Budget Tracking Tool

The Kenyan Budget Tracking Tool is a prominent example of how social media technology can help obviate traditional transparency issues. Despite increased development assistance and foreign aid, the number of Kenyans classified as poor grew from 29 percent in the 1970s to almost 60 percent in 2000. Noticing this trend, Philip Thigo created an online and SMS platform called the Kenyan Budget Tracking Tool. The platform specifically focuses on the Constituencies Development Fund, through which members of the Kenyan parliament are able to allocate resources towards various projects, such as physical infrastructure, government offices, or new schools.

This social monitoring technology has exposed real government abuses. …

Another mobile tool, Question Box, allows Ugandans to call or message operators who have access to a database full of information on health, agriculture, and education.

But tools like Medic Mobile and the Kenyan Budget Tracking Tool are only the first steps in solving the problems that plague corrupt governments and underdeveloped communities. Improved access to information is no substitute for good leadership. However, as Rakotomalala argued, it is an important stepping-stone. “While legally binding actions are the hammer to the nail, you need to put the proverbial nail in the right place first. That nail is transparency.”…(More)

Data to the Rescue: Smart Ways of Doing Good


Nicole Wallace in the Chronicle of Philanthropy: “For a long time, data served one purpose in the nonprofit world: measuring program results. But a growing number of charities are rejecting the idea that data equals evaluation and only evaluation.

Of course, many nonprofits struggle even to build the simplest data system. They have too little money, too few analysts, and convoluted data pipelines. Yet some cutting-edge organizations are putting data to work in new and exciting ways that drive their missions. A prime example: The Polaris Project is identifying criminal networks in the human-trafficking underworld and devising strategies to fight back by analyzing its data storehouse along with public information.

Other charities dive deep into their data to improve services, make smarter decisions, and identify measures that predict success. Some have such an abundance of information that they’re even pruning their collection efforts to allow for more sophisticated analysis.

The groups highlighted here are among the best nationally. In their work, we get a sneak peek at how the data revolution might one day achieve its promise.

House Calls: Living Goods

Living Goods launched in eastern Africa in 2007 with an innovative plan to tackle health issues in poor families and reduce deaths among children. The charity provides loans, training, and inventory to locals in Uganda and Kenya — mostly women — to start businesses selling vitamins, medicine, and other health products to friends and neighbors.

Founder Chuck Slaughter copied the Avon model and its army of housewives-turned-sales agents. But in recent years, Living Goods has embraced a 21st-century data system that makes its entrepreneurs better health practitioners. Armed with smartphones, they confidently diagnose and treat major illnesses. At the same time, they collect information that helps the charity track health across communities and plot strategy….

Unraveling Webs of Wickedness: Polaris Project

Calls and texts to the Polaris Project’s national human-trafficking hotline are often heartbreaking, terrifying, or both.

Relatives fear that something terrible has happened to a missing loved one. Trafficking survivors suffering from their ordeal need support. The most harrowing calls are from victims in danger and pleading for help.

Last year more than 5,500 potential cases of exploitation for labor or commercial sex were reported to the hotline. Since it got its start in 2007, the total is more than 24,000.

As it helps victims and survivors get the assistance they need, the Polaris Project, a Washington nonprofit, is turning those phone calls and texts into an enormous storehouse of information about the shadowy world of trafficking. By analyzing this data and connecting it with public sources, the nonprofit is drawing detailed pictures of how trafficking networks operate. That knowledge, in turn, shapes the group’s prevention efforts, its policy work, and even law-enforcement investigations….

Too Much Information: Year Up

Year Up has a problem that many nonprofits can’t begin to imagine: It collects too much data about its program. “Predictive analytics really start to stink it up when you put too much in,” says Garrett Yursza Warfield, the group’s director of evaluation.

What Mr. Warfield describes as the “everything and the kitchen sink” problem started soon after Year Up began gathering data. The group, which fights poverty by helping low-income young adults land entry-level professional jobs, first got serious about measuring its work nearly a decade ago. Though challenged at first to round up even basic information, the group over time began tracking virtually everything it could: the percentage of young people who finish the program, their satisfaction, their paths after graduation through college or work, and much more.

Now the nonprofit is diving deeper into its data to figure out which measures can predict whether a young person is likely to succeed in the program. And halfway through this review, it’s already identified and eliminated measures that it’s found matter little. A small example: Surveys of participants early in the program asked them to rate their proficiency at various office skills. Those self-evaluations, Mr. Warfield’s team concluded, were meaningless: How can novice professionals accurately judge their Excel spreadsheet skills until they’re out in the working world?…

On the Wild Side: Wildnerness Society…Without room to roam, wild animals and plants breed among themselves and risk losing genetic diversity. They also fall prey to disease. And that’s in the best of times. As wildlife adapt to climate change, the chance to migrate becomes vital even to survival.

National parks and other large protected areas are part of the answer, but they’re not enough if wildlife can’t move between them, says Travis Belote, lead ecologist at the Wilderness Society.

“Nature needs to be able to shuffle around,” he says.

Enter the organization’s Wildness Index. It’s a national map that shows the parts of the country most touched by human activity as well as wilderness areas best suited for wildlife. Mr. Belote and his colleagues created the index by combining data on land use, population density, road location and size, water flows, and many other factors. It’s an important tool to help the nonprofit prioritize the locations it fights to protect.

In Idaho, for example, the nonprofit compares the index with information about known wildlife corridors and federal lands that are unprotected but meet the criteria for conservation designation. The project’s goal: determine which areas in the High Divide — a wild stretch that connects Greater Yellowstone with other protected areas — the charity should advocate to legally protect….(More)”