How “Big Data” Went Bust


The problem with “big data” is not that data is bad. It’s not even that big data is bad: Applied carefully, massive data sets can reveal important trends that would otherwise go undetected. It’s the fetishization of data, and its uncritical use, that tends to lead to disaster, as Julia Rose West recently wrote for Slate. And that’s what “big data,” as a catchphrase, came to represent.

By its nature, big data is hard to interpret. When you’re collecting billions of data points—clicks or cursor positions on a website; turns of a turnstile in a large public space; hourly wind speed observations from around the world; tweets—the provenance of any given data point is obscured. This in turn means that seemingly high-level trends might turn out to be artifacts of problems in the data or methodology at the most granular level possible. But perhaps the bigger problem is that the data you have are usually only a proxy for what you really want to know. Big data doesn’t solve that problem—it magnifies it….

Aside from swearing off data and reverting to anecdote and intuition, there are at least two viable ways to deal with the problems that arise from the imperfect relationship between a data set and the real-world outcome you’re trying to measure or predict.

One is, in short: moar data. This has long been Facebook’s approach. When it became apparent that users’ “likes” were a flawed proxy for what they actually wanted to see more of in their feeds, the company responded by adding more and more proxies to its model. It began measuring other things, like the amount of time they spent looking at a post in their feed, the amount of time they spent reading a story they had clicked on, and whether they hit “like” before or after they had read the piece. When Facebook’s engineers had gone as far as they could in weighting and optimizing those metrics, they found that users were still unsatisfied in important ways. So the company added yet more metrics to the sauce: It started running huge user-survey panels, added new reaction emojis by which users could convey more nuanced sentiments, and started using A.I. to detect clickbait-y language in posts by pages and publishers. The company knows none of these proxies are perfect. But by constantly adding more of them to the mix, it can theoretically edge ever closer to an algorithm that delivers to users the posts that they most want to see.

One downside of the moar data approach is that it’s hard and expensive. Another is that the more variables are added to your model, the more complex, opaque, and unintelligible its methodology becomes. This is part of the problem Pasquale articulated in The Black Box Society. Even the most sophisticated algorithm, drawing on the best data sets, can go awry—and when it does, diagnosing the problem can be nigh-impossible. There are also the perils of “overfitting” and false confidence: The more sophisticated your model becomes, the more perfectly it seems to match up with all your past observations, and the more faith you place in it, the greater the danger that it will eventually fail you in a dramatic way. (Think mortgage crisis, election prediction models, and Zynga.)

Another possible response to the problems that arise from biases in big data sets is what some have taken to calling “small data.” Small data refers to data sets that are simple enough to be analyzed and interpreted directly by humans, without recourse to supercomputers or Hadoop jobs. Like “slow food,” the term arose as a conscious reaction to the prevalence of its opposite….(More)”

 

Priceless? A new framework for estimating the cost of open government reforms


New paper by Praneetha Vissapragada and Naomi Joswiak: “The Open Government Costing initiative, seeded with funding from the World Bank, was undertaken to develop a practical and actionable approach to pinpointing the full economic costs of various open government programs. The methodology developed through this initiative represents an important step towards conducting more sophisticated cost-benefit analyses – and ultimately understanding the true value – of open government reforms intended to increase citizen engagement, promote transparency and accountability, and combat corruption, insights that have been sorely lacking in the open government community to date. The Open Government Costing Framework and Methods section (Section 2 of this report) outlines the critical components needed to conduct cost analysis of open government programs, with the ultimate objective of putting a price tag on key open government reform programs in various countries at a particular point in time. This framework introduces a costing process that employs six essential steps for conducting a cost study, including (1) defining the scope of the program, (2) identifying types of costs to assess, (3) developing a framework for costing, (4) identifying key components, (5) conducting data collection and (6) conducting data analysis. While the costing methods are built on related approaches used for analysis in other sectors such as health and nutrition, this framework and methodology was specifically adapted for open government programs and thus addresses the unique challenges associated with these types of initiatives. Using the methods outlined in this document, we conducted a cost analysis of two case studies: (1) ProZorro, an e-procurement program in Ukraine; and (2) Sierra Leone’s Open Data Program….(More)”

When Cartography Meets Disaster Relief


Mimi Kirk at CityLab: “Almost three weeks after Hurricane Maria hit Puerto Rico, the island is in a grim state. Fewer than 15 percent of residents have power, and much of the island has no clean drinking water. Delivery of food and other necessities, especially to remote areas, has been hampered by a variety of ills, including a lack of cellular service, washed-out roads, additional rainfall, and what analysts and Puerto Ricans say is a slow and insufficient response from the U.S. government.

Another issue slowing recovery? Maps—or lack of them. While pre-Maria maps of Puerto Rico were fairly complete, their level of detail was nowhere near that of other parts of the United States. Platforms such as Google Maps are more comprehensive on the mainland than on the island, explains Juan Saldarriaga, a research scholar at the Center for Spatial Research at Columbia University. This is because companies like Google often create maps for financial reasons, selling them to advertisers or as navigation devices, so areas that have less economic activity are given less attention.

This lack of detail impedes recovery efforts: Without basic information on the location of buildings, for instance, rescue workers don’t know how many people were living in an area before the hurricane struck—and thus how much aid is needed.

Crowdsourced mapping can help. Saldarriaga recently organized a “mapathon” at Columbia, in which volunteers examined satellite imagery of Puerto Rico and added missing buildings, roads, bridges, and other landmarks in the open-source platform OpenStreetMap. While some universities and other groups are hosting similar events, anyone with an internet connection and computer can participate.

Saldarriaga and his co-organizers collaborated with Humanitarian OpenStreetMap Team (HOT), a nonprofit that works to create crowdsourced maps for aid and development work. Volunteers like Saldarriaga largely drive HOT’s “crisis mapping” projects, the first of which occurred in 2010 after Haiti’s earthquake…(More)”.

A Better Way to Trace Scattered Refugees


Tina Rosenberg in The New York Times: “…No one knew where his family had gone. Then an African refugee in Ottawa told him about Refunite. He went on its website and opened an account. He gave his name, phone number and place of origin, and listed family members he was searching for.

Three-quarters of a century ago, while World War II still raged, the Allies created the International Tracing Service to help the millions who had fled their homes. Its central name index grew to 50 million cards, with information on 17.5 million individuals. The index still exists — and still gets queries — today.

Index cards have become digital databases, of course. And some agencies have brought tracing into the digital age in other ways. Unicef, for example, equips staff during humanitarian emergencies with a software called Primero, which helps them get children food, medical care and other help — and register information about unaccompanied children. A parent searching for a child can register as well. An algorithm makes the connection — “like a date-finder or matchmaker,” said Robert MacTavish, who leads the Primero project.

Most United Nations agencies rely for family tracing on the International Committee of the Red Cross, the global network of national Red Cross and Red Crescent societies. Florence Anselmo, who directs the I.C.R.C.’s Central Tracing Agency, said that the I.C.R.C. and United Nations agencies can’t look in one another’s databases. That’s necessary for privacy reasons, but it’s an obstacle to family tracing.

Another problem: Online databases allow the displaced to do their own searches. But the I.C.R.C. has these for only a few emergency situations. Anselmo said that most tracing is done by the staff of national Red Cross societies, who respond to requests from other countries. But there is no global database, so people looking for loved ones must guess which countries to search.

The organization is working on developing an algorithm for matching, but for now, the search engines are human. “When we talk about tracing, it’s not only about data matching,” Anselmo said. “There’s a whole part about accompanying families: the human aspect, professionals as well as volunteers who are able to look for people — even go house to house if needed.”

This is the mom-and-pop general store model of tracing: The customer makes a request at the counter, then a shopkeeper with knowledge of her goods and a kind smile goes to the back and brings it out, throwing in a lollipop. But the world has 65 million forcibly displaced people, a record number. Personalized help to choose from limited stock is appropriate in many cases. But it cannot possibly be enough.

Refunite seeks to become the eBay of family tracing….(More)”

Co-creating an Open Government Data Driven Public Service: The Case of Chicago’s Food Inspection Forecasting Model


Conference paper by Keegan Mcbride et al: “Large amounts of Open Government Data (OGD) have become available and co-created public services have started to emerge, but there is only limited empirical material available on co-created OGD-driven public services. The authors have built a conceptual model around an innovation process based on the ideas of co-production and agile development for co-created OGD-driven public service. An exploratory case study on Chicago’s use of OGD in a predictive analytics model that forecasts critical safety violations at food serving establishments was carried out to expose the intricate process of how co-creation occurs and what factors allow for it to take place. Six factors were identified as playing a key role in allowing the co-creation of an OGD-driven public service to take place: external funding, motivated stakeholders, innovative leaders, proper communication channels, an existing OGD portal, and agile development practices. The conceptual model was generally validated, but further propositions on co-created OGD-driven public services emerged. These propositions state that the availability of OGD and tools for data analytics have the potential to enable the co-creation of OGD-driven public services, governments releasing OGD are acting as a platform and from this platform the co-creation of new and innovative OGD-driven public services may take place, and that the idea of Government as a Platform (GaaP) does appear to be an idea that allows for the topics of co-creation and OGD to be merged together….(More)”.

From Katrina To Harvey: How Disaster Relief Is Evolving With Technology


Cale Guthrie Weissman at Fast Company: “Open data may sound like a nerdy thing, but this weekend has proven it’s also a lifesaver in more ways than one.

As Hurricane Harvey pelted the southern coast of Texas, a local open-data resource helped provide accurate and up-to-date information to the state’s residents. Inside Harris County’s intricate bayou system–intended to both collect water and effectively drain it–gauges were installed to sense when water is overflowing. The sensors transmit the data to a website, which has become a vital go-to for Houston residents….

This open access to flood gauges is just one of the many ways new tech-driven projects have helped improve responses to disasters over the years. “There’s no question that technology has played a much more significant role,” says Lemaitre, “since even Hurricane Sandy.”

While Sandy was noted in 2012 for its ability to connect people with Twitter hashtags and other relatively nascent social apps like Instagram, the last few years have brought a paradigm shift in terms of how emergency relief organizations integrate technology into their responses….

Social media isn’t just for the residents. Local and national agencies–including FEMA–rely on this information and are using it to help create faster and more effective disaster responses. Following the disaster with Hurricane Katrina, FEMA worked over the last decade to revamp its culture and methods for reacting to these sorts of situations. “You’re seeing the federal government adapt pretty quickly,” says Lemaitre.

There are a few examples of this. For instance, FEMA now has an app to push necessary information about disaster preparedness. The agency also employs people to cull the open web for information that would help make its efforts better and more effective. These “social listeners” look at all the available Facebook, Snapchat, and other social media posts in aggregate. Crews are brought on during disasters to gather intelligence, and then report about areas that need relief efforts–getting “the right information to the right people,” says Lemaitre.

There’s also been a change in how this information is used. Often, when disasters are predicted, people send supplies to the affected areas as a way to try and help out. Yet they don’t know exactly where they should send it, and local organizations sometimes become inundated. This creates a huge logistical nightmare for relief organizations that are sitting on thousands of blankets and tarps in one place when they should be actively dispersing them across hundreds of miles.

“Before, you would just have a deluge of things dropped on top of a disaster that weren’t particularly helpful at times,” says Lemaitre. Now people are using sites like Facebook to ask where they should direct the supplies. For example, after a bad flood in Louisiana last year, a woman announced she had food and other necessities on Facebook and was able to direct the supplies to an area in need. This, says Lemaitre, is “the most effective way.”

Put together, Lemaitre has seen agencies evolve with technology to help create better systems for quicker disaster relief. This has also created a culture of learning updates and reacting in real time. Meanwhile, more data is becoming open, which is helping both people and agencies alike. (The National Weather Service, which has long trumpeted its open data for all, has become a revered stalwart for such information, and has already proven indispensable in Houston.)

Most important, the pace of technology has caused organizations to change their own procedures. Twelve years ago, during Katrina, the protocol was to wait until an assessment before deploying any assistance. Now organizations like FEMA know that just doesn’t work. “You can’t afford to lose time,” says Lemaitre. “Deploy as much as you can and be fast about it–you can always scale back.”

It’s important to note that, even with rapid technological improvements, there’s no way to compare one disaster response to another–it’s simply not apples to apples. All the same, organizations are still learning about where they should be looking and how to react, connecting people to their local communities when they need them most….(More)”.

Inside the Lab That’s Quantifying Happiness


Rowan Jacobsen at Outside: “In Mississippi, people tweet about cake and cookies an awful lot; in Colorado, it’s noodles. In Mississippi, the most-tweeted activity is eating; in Colorado, it’s running, skiing, hiking, snowboarding, and biking, in that order. In other words, the two states fall on opposite ends of the behavior spectrum. If you were to assign a caloric value to every food mentioned in every tweet by the citizens of the United States and a calories-burned value to every activity, and then totaled them up, you would find that Colorado tweets the best caloric ratio in the country and Mississippi the worst.

Sure, you’d be forgiven for doubting people’s honesty on Twitter. On those rare occasions when I destroy an entire pint of Ben and Jerry’s, I most assuredly do not tweet about it. Likewise, I don’t reach for my phone every time I strap on a pair of skis.

And yet there’s this: Mississippi has the worst rate of diabetes and heart disease in the country and Colorado has the best. Mississippi has the second-highest percentage of obesity; Colorado has the lowest. Mississippi has the worst life expectancy in the country; Colorado is near the top. Perhaps we are being more honest on social media than we think. And perhaps social media has more to tell us about the state of the country than we realize.

That’s the proposition of Peter Dodds and Chris Danforth, who co-direct the University of Vermont’s Computational Story Lab, a warren of whiteboards and grad students in a handsome brick building near the shores of Lake Champlain. Dodds and Danforth are applied mathematicians, but they would make a pretty good comedy duo. When I stopped by the lab recently, both were in running clothes and cracking jokes. They have an abundance of curls between them and the wiry energy of chronic thinkers. They came to UVM in 2006 to start the Vermont Complex Systems Center, which crunches big numbers from big systems and looks for patterns. Out of that, they hatched the Computational Story Lab, which sifts through some of that public data to discern the stories we’re telling ourselves. “It took us a while to come up with the name,” Dodds told me as we shotgunned espresso and gazed into his MacBook. “We were going to be the Department of Recreational Truth.”

This year, they teamed up with their PhD student Andy Reagan to launch the Lexicocalorimeter, an online tool that uses tweets to compute the calories in and calories out for every state. It’s no mere party trick; the Story Labbers believe the Lexicocalorimeter has important advantages over slower, more traditional methods of gathering health data….(More)”.

The Tech Revolution That’s Changing How We Measure Poverty


Alvin Etang Ndip at the Worldbank: “The world has an ambitious goal to end extreme poverty by 2030. But, without good poverty data, it is impossible to know whether we are making progress, or whether programs and policies are reaching those who are the most in need.

Countries, often in partnership with the World Bank Group and other agencies, measure poverty and wellbeing using household surveys that help give policymakers a sense of who the poor are, where they live, and what is holding back their progress. Once a paper-and-pencil exercise, technology is beginning to revolutionize the field of household data collection, and the World Bank is tapping into this potential to produce more and better poverty data….

“Technology can be harnessed in three different ways,” says Utz Pape, an economist with the World Bank. “It can help improve data quality of existing surveys, it can help to increase the frequency of data collection to complement traditional household surveys, and can also open up new avenues of data collection methods to improve our understanding of people’s behaviors.”

As technology is changing the field of data collection, researchers are continuing to find new ways to build on the power of mobile phones and tablets.

The World Bank’s Pulse of South Sudan initiative, for example, takes tablet-based data collection a step further. In addition to conducting the household survey, the enumerators also record a short, personalized testimonial with the people they are interviewing, revealing a first-person account of the situation on the ground. Such testimonials allow users to put a human face on data and statistics, giving a fuller picture of the country’s experience.

Real-time data through mobile phones

At the same time, more and more countries are generating real-time data through high-frequency surveys, capitalizing on the proliferation of mobile phones around the world. The World Bank’s Listening to Africa (L2A) initiative has piloted the use of mobile phones to regularly collect information on living conditions. The approach combines face-to-face surveys with follow-up mobile phone interviews to collect data that allows to monitor well-being.

The initiative hands out mobile phones and solar chargers to all respondents. To minimize the risk of people dropping out, the respondents are given credit top-ups to stay in the program. From monitoring health care facilities in Tanzania to collecting data on frequency of power outages in Togo, the initiative has been rolled out in six countries and has been used to collect data on a wide range of areas. …

Technology-driven data collection efforts haven’t been restricted to the Africa region alone. In fact, the approach was piloted early in Peru and Honduras with the Listening 2 LAC program. In Europe and Central Asia, the World Bank has rolled out the Listening to Tajikistan program, which was designed to monitor the impact of the Russian economic slowdown in 2014 and 2015. Initially a six-month pilot, the initiative has now been in operation for 29 months, and a partnership with UNICEF and JICA has ensured that data collection can continue for the next 12 months. Given the volume of data, the team is currently working to create a multidimensional fragility index, where one can monitor a set of well-being indicators – ranging from food security to quality jobs and public services – on a monthly basis…

There are other initiatives, such as in Mexico where the World Bank and its partners are using satellite imagery and survey data to estimate how many people live below the poverty line down to the municipal level, or guiding data collectors using satellite images to pick a representative sample for the Somali High Frequency Survey. However, despite the innovation, these initiatives are not intended to replace traditional household surveys, which still form the backbone of measuring poverty. When better integrated, they can prove to be a formidable set of tools for data collection to provide the best evidence possible to policymakers….(More)”

Scientists Use Google Earth and Crowdsourcing to Map Uncharted Forests


Katie Fletcher, Tesfay Woldemariam and Fred Stolle at EcoWatch: “No single person could ever hope to count the world’s trees. But a crowd of them just counted the world’s drylands forests—and, in the process, charted forests never before mapped, cumulatively adding up to an area equivalent in size to the Amazon rainforest.

Current technology enables computers to automatically detect forest area through satellite data in order to adequately map most of the world’s forests. But drylands, where trees are fewer and farther apart, stymied these modern methods. To measure the extent of forests in drylands, which make up more than 40 percent of land surface on Earth, researchers from UN Food and Agriculture Organization, World Resources Institute and several universities and organizations had to come up with unconventional techniques. Foremost among these was turning to residents, who contributed their expertise through local map-a-thons….

Google Earth collects satellite data from several satellites with a variety of resolutions and technical capacities. The dryland satellite imagery collection compiled by Google from various providers, including Digital Globe, is of particularly high quality, as desert areas have little cloud cover to obstruct the views. So while difficult for algorithms to detect non-dominant land cover, the human eye has no problem distinguishing trees in the landscapes. Using this advantage, the scientists decided to visually count trees in hundreds of thousands of high-resolution images to determine overall dryland tree cover….

Armed with the quality images from Google that allowed researchers to see objects as small as half a meter (about 20 inches) across, the team divided the global dryland images into 12 regions, each with a regional partner to lead the counting assessment. The regional partners in turn recruited local residents with practical knowledge of the landscape to identify content in the sample imagery. These volunteers would come together in participatory mapping workshops, known colloquially as “map-a-thons.”…

Utilizing local landscape knowledge not only improved the map quality but also created a sense of ownership within each region. The map-a-thon participants have access to the open source tools and can now use these data and results to better engage around land use changes in their communities. Local experts, including forestry offices, can also use this easily accessible application to continue monitoring in the future.

Global Forest Watch uses medium resolution satellites (30 meters or about 89 feet) and sophisticated algorithms to detect near-real time deforestation in densely forested area. The dryland tree cover maps complement Global Forest Watch by providing the capability to monitor non-dominant tree cover and small-scale, slower-moving events like degradation and restoration. Mapping forest change at this level of detail is critical both for guiding land decisions and enabling government and business actors to demonstrate their pledges are being fulfilled, even over short periods of time.

The data documented by local participants will enable scientists to do many more analyses on both natural and man-made land changes including settlements, erosion features and roads. Mapping the tree cover in drylands is just the beginning….(More)”.

Design Thinking for the Greater Good


New Book by Jeanne Liedtka, Randy Salzman, and Daisy Azer:  “Facing especially wicked problems, social sector organizations are searching for powerful new methods to understand and address them. Design Thinking for the Greater Good goes in depth on both the how of using new tools and the why. As a way to reframe problems, ideate solutions, and iterate toward better answers, design thinking is already well established in the commercial world. Through ten stories of struggles and successes in fields such as health care, education, agriculture, transportation, social services, and security, the authors show how collaborative creativity can shake up even the most entrenched bureaucracies—and provide a practical roadmap for readers to implement these tools.

The design thinkers Jeanne Liedtka, Randy Salzman, and Daisy Azer explore how major agencies like the Department of Health and Human Services and the Transportation and Security Administration in the United States, as well as organizations in Canada, Australia, and the United Kingdom, have instituted principles of design thinking. In each case, these groups have used the tools of design thinking to reduce risk, manage change, use resources more effectively, bridge the communication gap between parties, and manage the competing demands of diverse stakeholders. Along the way, they have improved the quality of their products and enhanced the experiences of those they serve. These strategies are accessible to analytical and creative types alike, and their benefits extend throughout an organization. This book will help today’s leaders and thinkers implement these practices in their own pursuit of creative solutions that are both innovative and achievable….(More)”.