Data collection is the ultimate public good


Lawrence H. Summers in the Washington Post: “I spoke at a World Bank conference on price statistics. … I am convinced that data is the ultimate public good and that we will soon have much more data than we do today. I made four primary observations.

First, scientific progress is driven more by new tools and new observations than by hypothesis construction and testing. I cited a number of examples: the observation that Jupiter was orbited by several moons clinched the case against the Ptolemaic system, the belief that all celestial objects circle around the Earth. We learned of cells by seeing them when the microscope was constructed. Accelerators made the basic structure of atoms obvious.

Second, if mathematics is the queen of the hard sciences then statistics is the queen of the social sciences. I gave examples of the power of very simple data analysis. We first learned that exercise is good for health from the observation that, in the 1940s, London bus conductors had much lower death rates than bus drivers. Similarly, data demonstrated that smoking was a major killer decades before the biological processes were understood. At a more trivial level, “Moneyball” shows how data-based statistics can revolutionize a major sport.

Third, I urged that what “you count counts” and argued that we needed much more timely and complete data. I noted the centrality of timely statistics to meaningful progress toward Sustainable Development Goals. In comparison to the nearly six-year lag in poverty statistics, it took the United States only about 3½ years to win World War II.

Fourth, I envisioned what might be possible in a world where there will soon be as many smartphones as adults. With the ubiquitous ability to collect data and nearly unlimited ability to process it will come more capacity to discover previously unknown relationships. We will improve our ability to predict disasters like famines, storms and revolutions. Communication technologies will allow us to better hold policymakers to account with reliable and rapid performance measures. And if history is any guide, we will gain capacities on dimensions we cannot now imagine but will come to regard as indispensable.

This is the work of both governments and the private sector. It is fantasy to suppose data, as the ultimate public good, will come into being without government effort. Equally, we will sell ourselves short if we stick with traditional collection methods and ignore innovative providers and methods such as the use of smartphones, drones, satellites and supercomputers. That is why something like the Billion Prices Project at MIT, which can provide daily price information, is so important. That is why I am excited to be a director and involved with Premise — a data company that analyzes information people collect on their smartphones about everyday life, like the price of local foods — in its capacity to mobilize these technologies as widely as possible. That is why Planet Labs, with its capacity to scan and monitor environmental conditions, represents such a profound innovation….(More)

Organizational Routines: How They Are Created, Maintained, and Changed


Book edited by Jennifer Howard-Grenville, Claus Rerup, Ann Langley, and Haridimos Tsoukas: “Over the past 15 years, organizational routines have been increasingly investigated from a process perspective to challenge the idea that routines are stable entities that are mindlessly enacted.

A process perspective explores how routines are performed by specific people in specific settings. It shows how action, improvisation, and novelty are part of routine performances. It also departs from a view of routines as “black boxes” that transform inputs into organizational outputs and places attention on the actual actions and patterns that comprise routines. Routines are both effortful accomplishments, in that it takes effort to perform, sustain, or change them, and emergent accomplishments, because sometimes the effort to perform routines leads to unforeseen change.

While a process perspective has enabled scholars to open up the “black box” of routines and explore their actions and patterns in fine-grained, dynamic ways, there is much more work to be done. Chapters in this volume make considerable progress, through the three main themes expressed across these chapters. These are: Zooming out to understand routines in larger contexts; Zooming in to reveal actor dispositions and skill; and Innovation, creativity and routines in ambiguous contexts….(More)”

What Should We Do About Big Data Leaks?


Paul Ford at the New Republic: “I have a great fondness for government data, and the government has a great fondness for making more of it. Federal elections financial data, for example, with every contribution identified, connected to a name and address. Or the results of the census. I don’t know if you’ve ever had the experience of downloading census data but it’s pretty exciting. You can hold America on your hard drive! Meditate on the miracles of zip codes, the way the country is held together and addressable by arbitrary sets of digits.

You can download whole books, in PDF format, about the foreign policy of the Reagan Administration as it related to Russia. Negotiations over which door the Soviet ambassador would use to enter a building. Gigabytes and gigabytes of pure joy for the ephemeralist. The government is the greatest creator of ephemera ever.

Consider the Financial Crisis Inquiry Commission, or FCIC, created in 2009 to figure out exactly how the global economic pooch was screwed. The FCIC has made so much data, and has done an admirable job (caveats noted below) of arranging it. So much stuff. There are reams of treasure on a single FCIC web site, hosted at Stanford Law School: Hundreds of MP3 files, for example, with interviews with Jamie Dimonof JPMorgan Chase and Lloyd Blankfein of Goldman Sachs. I am desperate to find  time to write some code that automatically extracts random audio snippets from each and puts them on top of a slow ambient drone with plenty of reverb, so that I can relax to the dulcet tones of the financial industry explaining away its failings. (There’s a Paul Krugman interview that I assume is more critical.)

The recordings are just the beginning. They’ve released so many documents, and with the documents, a finding aid that you can download in handy PDF format, which will tell you where to, well, find things, pointing to thousands of documents. That aid alone is 1,439 pages.

Look, it is excellent that this exists, in public, on the web. But it also presents a very contemporary problem: What is transparency in the age of massive database drops? The data is available, but locked in MP3s and PDFs and other documents; it’s not searchable in the way a web page is searchable, not easy to comment on or share.

Consider the WikiLeaks release of State Department cables. They were exhausting, there were so many of them, they were in all caps. Or the trove of data Edward Snowden gathered on aUSB drive, or Chelsea Manning on CD. And the Ashley Madison leak, spread across database files and logs of credit card receipts. The massive and sprawling Sony leak, complete with whole email inboxes. And with the just-released Panama Papers, we see two exciting new developments: First, the consortium of media organizations that managed the leak actually came together and collectively, well, branded the papers, down to a hashtag (#panamapapers), informational website, etc. Second, the size of the leak itself—2.5 terabytes!—become a talking point, even though that exact description of what was contained within those terabytes was harder to understand. This, said the consortia of journalists that notably did not include The New York Times, The Washington Post, etc., is the big one. Stay tuned. And we are. But the fact remains: These artifacts are not accessible to any but the most assiduous amateur conspiracist; they’re the domain of professionals with the time and money to deal with them. Who else could be bothered?

If you watched the movie Spotlight, you saw journalists at work, pawing through reams of documents, going through, essentially, phone books. I am an inveterate downloader of such things. I love what they represent. And I’m also comfortable with many-gigabyte corpora spread across web sites. I know how to fetch data, how to consolidate it, and how to search it. I share this skill set with many data journalists, and these capacities have, in some ways, become the sole province of the media. Organs of journalism are among the only remaining cultural institutions that can fund investigations of this size and tease the data apart, identifying linkages and thus constructing informational webs that can, with great effort, be turned into narratives, yielding something like what we call “a story” or “the truth.” 

Spotlight was set around 2001, and it features a lot of people looking at things on paper. The problem has changed greatly since then: The data is everywhere. The media has been forced into a new cultural role, that of the arbiter of the giant and semi-legal database. ProPublica, a nonprofit that does a great deal of data gathering and data journalism and then shares its findings with other media outlets, is one example; it funded a project called DocumentCloud with other media organizations that simplifies the process of searching through giant piles of PDFs (e.g., court records, or the results of Freedom of Information Act requests).

At some level the sheer boredom and drudgery of managing these large data leaks make them immune to casual interest; even the Ashley Madison leak, which I downloaded, was basically an opaque pile of data and really quite boring unless you had some motive to poke around.

If this is the age of the citizen journalist, or at least the citizen opinion columnist, it’s also the age of the data journalist, with the news media acting as product managers of data leaks, making the information usable, browsable, attractive. There is an uneasy partnership between leakers and the media, just as there is an uneasy partnership between the press and the government, which would like some credit for its efforts, thank you very much, and wouldn’t mind if you gave it some points for transparency while you’re at it.

Pause for a second. There’s a glut of data, but most of it comes to us in ugly formats. What would happen if the things released in the interest of transparency were released in actual transparent formats?…(More)”

Can Data Literacy Protect Us from Misleading Political Ads?


Walter Frick at Harvard Business Review: “It’s campaign season in the U.S., and politicians have no compunction about twisting facts and figures, as a quick skim of the fact-checking website Politifact illustrates.

Can data literacy guard against the worst of these offenses? Maybe, according to research.

There is substantial evidence that numeracy can aid critical thinking, and some reason to think it can help in the political realm, within limits. But there is also evidence that numbers can mislead even data-savvy people when it’s in service of those people’s politics.

In a study published at the end of last year, Vittorio Merola of Ohio State University and Matthew Hitt of Louisiana State examined how numeracy might guard against partisan messaging. They showed participants information comparing the costs of probation and prison, and then asked whether participants agreed with the statement, “Probation should be used as an alternative form of punishment, instead of prison, for felons.”

Some of the participants were shown highly relevant numeric information arguing for the benefits of probation: that it costs less and has a better cost-benefit ratio, and that the cost of U.S. prisons has been rising. Another group was shown weaker, less-relevant numeric information. This message didn’t contain anything about the costs or benefits of parole, and instead compared prison costs to transportation spending, with no mention of why these might be at all related. The experiment also varied whether the information was supposedly from a study commissioned by Democrats or Republicans.

The researchers scored participants’ numeracy by asking questions like, “The chance of getting a viral infection is 0.0005. Out of 10,000 people, about how
many of them are expected to get infected?”

For participants who scored low in numeracy, their support depended more on the political party making the argument than on the strength of the data. When the information came from those participants’ own party, they were more likely to agree with it, no matter whether it was weak or strong.

By contrast, participants who scored higher in numeracy were persuaded by the stronger numeric information, even when it came from the other party. The results held up even after accounting for participants’ education, among other variables….

In 2013, Dan Kahan of Yale and several colleagues conducted a study in which they asked participants to draw conclusions from data. In one group, the data was about a treatment for skin rashes, a nonpolitical topic. Another group was asked to evaluate data on gun control, comparing crime rates for cities that have banned concealed weapons to cities that haven’t.

Additionally, in the skin rash group some participants were shown data indicating that the use of skin cream correlated with rashes getting better, while some were shown the opposite. Similarly, some in the gun control group were shown less crime in cities that have banned concealed weapons, while some were shown the reverse…. They found that highly numerate people did better than less-numerate ones in drawing the correct inference in the skin rash case. But comfort with numbers didn’t seem to help when it came to gun control. In fact, highly numerate participants were more polarized over the gun control data than less-numerate ones. The reason seemed to be that the numerate participants used their skill with data selectively, employing it only when doing so helped them reach a conclusion that fit with their political ideology.

Two other lines of research are relevant here.

First, work by Philip Tetlock and Barbara Mellers of the University of Pennsylvania suggests that numerate people tend to make better forecasts, including about geopolitical events. They’ve also documented that even very basic training in probabilistic thinking can improve one’s forecasting accuracy. And this approach works best, Tetlock argues, when it’s part of a whole style of thinking that emphasizes multiple points of view.

Second, two papers, one from the University of Texas at Austin and one from Princeton, found that partisan bias can be diminished with incentives: People are more likely to report factually correct beliefs about the economy when money is on the line…..(More)”

Social app for refugees and locals translates in real-time


Springwise: “Europe is in the middle of a major refugee crisis, with more than one million migrants arriving in 2015 alone. Now, developers in Stockholm are coming up with new ways for arrivals to integrate into their new homes.

Welcome! is an app based in Sweden, a country that has operated a broadly open policy to immigration in recent years. The developers say the app aims to break down social and language barriers between Swedes and refugees. Welcome! is translated into Arabic, Persian, Swedish and English, and it enables users to create, host and join activities, as well as ask questions of locals, chat with new contacts, and browse events that are nearby.

The idea is to solve one of the major difficulties for immigrants arriving in Europe by encouraging the new arrivals and locals to interact and connect, helping the refugees to settle in. The app offers real-time auto-translation through its four languages, and can be downloaded for iOS and Android….We have already seen an initiative in Finland helping to set up startups with refugees…(More)

Crowdsourcing a Collective Sense of Place


Jenkins A., Croitoru A., Crooks A.T., Stefanidis A. in PLOS: “Place can be generally defined as a location that has been assigned meaning through human experience, and as such it is of multidisciplinary scientific interest. Up to this point place has been studied primarily within the context of social sciences as a theoretical construct. The availability of large amounts of user-generated content, e.g. in the form of social media feeds or Wikipedia contributions, allows us for the first time to computationally analyze and quantify the shared meaning of place. By aggregating references to human activities within urban spaces we can observe the emergence of unique themes that characterize different locations, thus identifying places through their discernible sociocultural signatures. In this paper we present results from a novel quantitative approach to derive such sociocultural signatures from Twitter contributions and also from corresponding Wikipedia entries. By contrasting the two we show how particular thematic characteristics of places (referred to herein as platial themes) are emerging from such crowd-contributed content, allowing us to observe the meaning that the general public, either individually or collectively, is assigning to specific locations. Our approach leverages probabilistic topic modelling, semantic association, and spatial clustering to find locations are conveying a collective sense of place. Deriving and quantifying such meaning allows us to observe how people transform a location to a place and shape its characteristics….(More)”

Hermeneutica: Computer-Assisted Interpretation in the Humanities


Book by Geoffrey Rockwell and Stéfan Sinclair: “The image of the scholar as a solitary thinker dates back at least to Descartes’ Discourse on Method. But scholarly practices in the humanities are changing as older forms of communal inquiry are combined with modern research methods enabled by the Internet, accessible computing, data availability, and new media. Hermeneutica introduces text analysis using computer-assisted interpretive practices. It offers theoretical chapters about text analysis, presents a set of analytical tools (called Voyant) that instantiate the theory, and provides example essays that illustrate the use of these tools. Voyant allows users to integrate interpretation into texts by creating hermeneutica—small embeddable “toys” that can be woven into essays published online or into such online writing environments as blogs or wikis. The book’s companion website, Hermeneutic.ca, offers the example essays with both text and embedded interactive panels. The panels show results and allow readers to experiment with the toys themselves.

The use of these analytical tools results in a hybrid essay: an interpretive work embedded with hermeneutical toys that can be explored for technique. The hermeneutica draw on and develop such common interactive analytics as word clouds and complex data journalism interactives. Embedded in scholarly texts, they create a more engaging argument. Moving between tool and text becomes another thread in a dynamic dialogue….(More)”

Technology for Transparency: Cases from Sub-Saharan Africa


 at Havard Political Review: “Over the last decade, Africa has experienced previously unseen levels of economic growth and market vibrancy. Developing countries can only achieve equitable growth and reduce poverty rates, however, if they are able to make the most of their available resources. To do this, they must maximize the impact of aid from donor governments and NGOs and ensure that domestic markets continue to diversify, add jobs, and generate tax revenues. Yet, in most developing countries, there is a dearth of information available about industry profits, government spending, and policy outcomes that prevents efficient action.

ONE, an international advocacy organization, has estimated that $68.6 billion was lost in sub-Saharan Africa in 2012 due to a lack of transparency in government budgeting….

The Importance of Technology

Increased visibility of problems exerts pressure on politicians and other public sector actors to adjust their actions. This process is known as social monitoring, and it relies on citizens or public agencies using digital tools, such as mobile phones, Facebook, and other social media sites to spot public problems. In sub-Saharan Africa, however, traditional media companies and governments have not shown consistency in reporting on transparency issues.

New technologies offer a solution to this problem. Philip Thigo, the creator of an online and SMS platform that monitors government spending, said in an interview with Technology for Transparency, “All we are trying to do is enhance the work that [governments] do. We thought that if we could create a clear channel where communities could actually access data, then the work of government would be easier.” Networked citizen media platforms that rely on the volunteer contributions of citizens have become increasingly popular. Given that in most African countries less than 10 percent of the population has Internet access, mobile-device-based programs have proven the logical solution. About 30 percent of the population continent-wide has access to cell phones.

Lova Rakotomalala, a co-founder of an NGO in Madagascar that promotes online exposure of social grassroots projects, told the HPR, “most Malagasies will have a mobile phone and an FM radio because it helps them in their daily lives.” Rakotomalala works to provide workshops and IT training to people in regions of Madagascar where Internet access has been recently introduced. According to him, “the amount of data that we can collect from social monitoring and transparency projects will only grow in the near future. There is much room for improvement.”

Kenyan Budget Tracking Tool

The Kenyan Budget Tracking Tool is a prominent example of how social media technology can help obviate traditional transparency issues. Despite increased development assistance and foreign aid, the number of Kenyans classified as poor grew from 29 percent in the 1970s to almost 60 percent in 2000. Noticing this trend, Philip Thigo created an online and SMS platform called the Kenyan Budget Tracking Tool. The platform specifically focuses on the Constituencies Development Fund, through which members of the Kenyan parliament are able to allocate resources towards various projects, such as physical infrastructure, government offices, or new schools.

This social monitoring technology has exposed real government abuses. …

Another mobile tool, Question Box, allows Ugandans to call or message operators who have access to a database full of information on health, agriculture, and education.

But tools like Medic Mobile and the Kenyan Budget Tracking Tool are only the first steps in solving the problems that plague corrupt governments and underdeveloped communities. Improved access to information is no substitute for good leadership. However, as Rakotomalala argued, it is an important stepping-stone. “While legally binding actions are the hammer to the nail, you need to put the proverbial nail in the right place first. That nail is transparency.”…(More)

Automating power: Social bot interference in global politics


Samuel C. Woolley at First Monday: “Over the last several years political actors worldwide have begun harnessing the digital power of social bots — software programs designed to mimic human social media users on platforms like Facebook, Twitter, and Reddit. Increasingly, politicians, militaries, and government-contracted firms use these automated actors in online attempts to manipulate public opinion and disrupt organizational communication. Politicized social bots — here ‘political bots’ — are used to massively boost politicians’ follower levels on social media sites in attempts to generate false impressions of popularity. They are programmed to actively and automatically flood news streams with spam during political crises, elections, and conflicts in order to interrupt the efforts of activists and political dissidents who publicize and organize online. They are used by regimes to send out sophisticated computational propaganda. This paper conducts a content analysis of available media articles on political bots in order to build an event dataset of global political bot deployment that codes for usage, capability, and history. This information is then analyzed, generating a global outline of this phenomenon. This outline seeks to explain the variety of political bot-oriented strategies and presents details crucial to building understandings of these automated software actors in the humanities, social and computer sciences….(More)”

Drones Marshaled to Drop Lifesaving Supplies Over Rwandan Terrain


From a bluff overlooking the Pacific Ocean, aloud pop signals the catapult launch of a small fixed-wing drone that is designed to carry medical supplies to remote locations almost 40 miles away.

The drones are the brainchild of a small group of engineers at a SiliconValley start-up called Zipline, which plans to begin operating a service with them for the government of Rwanda in July. The fleet of robot planes will initially cover more than half the tiny African nation, creating a highly automated network to shuttle blood and pharmaceuticals to remote locations in hours rather than weeks or months.

Rwanda, one of the world’s poorest nations, was ranked 170th by gross domestic product in 2014 by the International Monetary Fund. And so it is striking that the country will be the first, company executives said, to establish a commercial drone delivery network — putting it ahead of places like the United States, where there have been heavily ballyhooed futuristicdrone delivery systems promising urban and suburban package delivery from tech giants such as Amazon and Google….

That Rwanda is set to become the first country with a drone delivery network illustrates the often uneven nature of the adoption of new technology. In the United States, drones have run into a wall of regulation and conflicting rules. But in Rwanda, the country’s master development plan has placed a priority on the use of the machines, first for medicine and then more broadly for economic development….

The new drone system will initially be capable of making 50 to 150 daily deliveries of blood and emergency medicine to Rwanda’s 21 transfusing facilities, mostly in hospitals and clinics in the western half of the nation.

The drone system is based on a fleet of 15 small aircraft, each with twin electric motors, a 3.5-pound payload and an almost eight-foot wingspan.The system’s speed makes it possible to maintain a “cold chain” —essentially a temperature-controlled supply chain needed to provide blood and vaccines — which is often not practical to establish in developing countries.

The Zipline drones will use GPS receivers to navigate and communicate via the Rwandan cellular network. They will be able to fly in rough weather conditions, enduring winds up to 30 miles per hour….(More)”