Building Data Responsibility into Humanitarian Action


Stefaan Verhulst at The GovLab: “Next Monday, May 23rd, governments, non-profit organizations and citizen groups will gather in Istanbul at the first World Humanitarian Summit. A range of important issues will be on the agenda, not least of which the refugee crisis confronting the Middle East and Europe. Also on the agenda will be an issue of growing importance and relevance, even if it does not generate front-page headlines: the increasing potential (and use) of data in the humanitarian context.

To explore this topic, a new paper, “Building Data Responsibility into Humanitarian Action,” is being released today, and will be presented tomorrow at the Understanding Risk Forum. This paper is the result of a collaboration between the United Nations Office for the Coordination of Humanitarian Affairs (OCHA), The GovLab (NYU Tandon School of Engineering), the Harvard Humanitarian Initiative, and Leiden UniversityCentre for Innovation. It seeks to identify the potential benefits and risks of using data in the humanitarian context, and begins to outline an initial framework for the responsible use of data in humanitarian settings.

Both anecdotal and more rigorously researched evidence points to the growing use of data to address a variety of humanitarian crises. The paper discusses a number of data risk case studies, including the use of call data to fight Malaria in Africa; satellite imagery to identify security threats on the border between Sudan and South Sudan; and transaction data to increase the efficiency of food delivery in Lebanon. These early examples (along with a few others discussed in the paper) have begun to show the opportunities offered by data and information. More importantly, they also help us better understand the risks, including and especially those posed to privacy and security.

One of the broader goals of the paper is to integrate the specific and the theoretical, in the process building a bridge between the deep, contextual knowledge offered by initiatives like those discussed above and the broader needs of the humanitarian community. To that end, the paper builds on its discussion of case studies to begin establishing a framework for the responsible use of data in humanitarian contexts. It identifies four “Minimum Humanitarian standards for the Responsible use of Data” and four “Characteristics of Humanitarian Organizations that use Data Responsibly.” Together, these eight attributes can serve as a roadmap or blueprint for humanitarian groups seeking to use data. In addition, the paper also provides a four-step practical guide for a data responsibility framework (see also earlier blog)….(More)” Full Paper: Building Data Responsibility into Humanitarian Action

Outstanding Challenges in Recent Open Government Data Initiatives


Paper by Usamah A. Algemili: “In recent years, we have witnessed increasing interest in government data. Many governments around the world have sensed the value of its passive data sets. These governments started their Open Data policies, yet many countries are on the way converting raw data into useful representation. This paper surveys the previous efforts of Open Data initiatives. It discusses the various challenges that open data projects may encounter during the transformation from passive data sets towards Open Data culture. It reaches out project teams acquiring their practical assessment. Thus, an online form has been distributed among project teams. The questionnaire was developed in alignment with previous literature of data integration challenges. 138 eligible professional participated, and their responds has been analyzed by the researcher. The result section identifies the most critical challenges from project-teams’ point-of-view, and the findings show four obstacles that stand out as critical challenges facing project teams. This paper casts on these challenges, and it attempts to indicate the missing gap between current guidelines and practical experience. Accordingly, this paper presents the current infrastructure of Open Data framework followed by additional recommendations that may lead to successful implementation of Open Data development….(More)”

We know where you live


MIT News Office: “From location data alone, even low-tech snoopers can identify Twitter users’ homes, workplaces….Researchers at MIT and Oxford University have shown that the location stamps on just a handful of Twitter posts — as few as eight over the course of a single day — can be enough to disclose the addresses of the poster’s home and workplace to a relatively low-tech snooper.

The tweets themselves might be otherwise innocuous — links to funny videos, say, or comments on the news. The location information comes from geographic coordinates automatically associated with the tweets.

Twitter’s location-reporting service is off by default, but many Twitter users choose to activate it. The new study is part of a more general project at MIT’s Internet Policy Research Initiative to help raise awareness about just how much privacy people may be giving up when they use social media.

The researchers describe their research in a paper presented last week at the Association for Computing Machinery’s Conference on Human Factors in Computing Systems, where it received an honorable mention in the best-paper competition, a distinction reserved for only 4 percent of papers accepted to the conference.

“Many people have this idea that only machine-learning techniques can discover interesting patterns in location data,” says Ilaria Liccardi, a research scientist at MIT’s Internet Policy Research Initiative and first author on the paper. “And they feel secure that not everyone has the technical knowledge to do that. With this study, what we wanted to show is that when you send location data as a secondary piece of information, it is extremely simple for people with very little technical knowledge to find out where you work or live.”

Conclusions from clustering

In their study, Liccardi and her colleagues — Alfie Abdul-Rahman and Min Chen of Oxford’s e-Research Centre in the U.K. — used real tweets from Twitter users in the Boston area. The users consented to the use of their data, and they also confirmed their home and work addresses, their commuting routes, and the locations of various leisure destinations from which they had tweeted.

The time and location data associated with the tweets were then presented to a group of 45 study participants, who were asked to try to deduce whether the tweets had originated at the Twitter users’ homes, their workplaces, leisure destinations, or locations along their commutes. The participants were not recruited on the basis of any particular expertise in urban studies or the social sciences; they just drew what conclusions they could from location clustering….

Predictably, participants fared better with map-based representations, correctly identifying Twitter users’ homes roughly 65 percent of the time and their workplaces at closer to 70 percent. Even the tabular representation was informative, however, with accuracy rates of just under 50 percent for homes and a surprisingly high 70 percent for workplaces….(More; Full paper )”

Where are Human Subjects in Big Data Research? The Emerging Ethics Divide


Paper by Jacob Metcalf and Kate Crawford: “There are growing discontinuities between the research practices of data science and established tools of research ethics regulation. Some of the core commitments of existing research ethics regulations, such as the distinction between research and practice, cannot be cleanly exported from biomedical research to data science research. These discontinuities have led some data science practitioners and researchers to move toward rejecting ethics regulations outright. These shifts occur at the same time as a proposal for major revisions to the Common Rule — the primary regulation governing human-subjects research in the U.S. — is under consideration for the first time in decades. We contextualize these revisions in long-running complaints about regulation of social science research, and argue data science should be understood as continuous with social sciences in this regard. The proposed regulations are more flexible and scalable to the methods of non-biomedical research, but they problematically exclude many data science methods from human-subjects regulation, particularly uses of public datasets. The ethical frameworks for big data research are highly contested and in flux, and the potential harms of data science research are unpredictable. We examine several contentious cases of research harms in data science, including the 2014 Facebook emotional contagion study and the 2016 use of geographical data techniques to identify the pseudonymous artist Banksy. To address disputes about human-subjects research ethics in data science,critical data studies should offer a historically nuanced theory of “data subjectivity” responsive to the epistemic methods, harms and benefits of data science and commerce….(More)”

Insights On Collective Problem-Solving: Complexity, Categorization And Lessons From Academia


Part 3 of an interview series by Henry Farrell for the MacArthur Research Network on Opening Governance: “…Complexity theorists have devoted enormous energy and attention to thinking about how complex problems, in which different factors interact in ways that are hard to predict, can best be solved. One key challenge is categorizing problems, so as to understand which approaches are best suited to addressing them.

Scott Page is the Leonid Hurwicz Collegiate Professor of Complex Systems at the University of Michigan, Ann Arbor, and one of the world’s foremost experts on diversity and problem-solving. I asked him a series of questions about how we might use insights from academic research to think better about how problem solving works.

Henry: One of the key issues of collective problem-solving is what you call the ‘problem of problems’ – the question of identifying which problems we need to solve. This is often politically controversial – e.g., it may be hard to get agreement that global warming, or inequality, or long prison sentences are a problem. How do we best go about identifying problems, given that people may disagree?

Scott: In a recent big think paper on the potential of diversity for collective problem solving in Scientific American, Katherine Phillips writes that group members must feel validated, that they must share a commitment to the group, and they must have a common goal if they are going to contribute. This implies that you won’t succeed in getting people to collaborate by setting an agenda from on high and then seeking to attract diverse people to further that agenda.

One way of starting to tackle the problem of problems is to steal a rule of thumb from Getting to Yes, by getting to think people about their broad interests rather than the position that they’re starting from. People often agree on their fundamental desires but disagree on how they can be achieved. For example, nearly everyone wants less crime, but they may disagree over whether they think the solution to crime involves tackling poverty or imposing longer prison sentences. If you can get them to focus on their common interest in solving crime rather than their disagreements, you’re more likely to get them to collaborate usefully.

Segregation amplifies the problem of problems. We live in towns and neighborhoods segregated by race, income, ideology, and human capital. Democrats live near Democrats and Republicans near Republicans. Consensus requires integration. We must work across ideologies. Relatedly, opportunity requires more than access. Many people grow up not knowing any engineers, dentists, doctors, lawyers, and statisticians. This isolation narrows the set of careers they consider and it reduces the diversity of many professions. We cannot imagine lives we do not know.

Henry: Once you get past the problem of problems, you still need to identify which kind of problem you are dealing with. You identify three standard types of problems: solution problems, selection problems and optimization problems. What – very briefly – are the key differences between these kinds of problems?

Scott: I’m constantly pondering the potential set of categories in which collective intelligence can emerge. I’m teaching a course on collective intelligence this semester and the undergraduates and I developed an acronym SCARCE PIGS to describe the different types of domains. Here’s the brief summary:

  • Predict: when individuals combine information, models, or measurements to estimate a future event, guess an answer, or classify an event. Examples might involve betting markets, or combined efforts to guess a quantity, such as Francis Galton’s example of people at a fair trying to guess the weight of a steer.
  • Identify: when individuals have local, partial, or possibly erroneous knowledge and collectively can find an object. Here, an example is DARPA’s Red Balloon project.
  • Solve: when individuals apply and possibly combine higher order cognitive processes and analytic tools for the purpose of finding or improving a solution to a task. Innocentive and similar organizations provide examples of this.
  • Generate: when individuals apply diverse representations, heuristics, and knowledge to produce something new. An everyday example is creating a new building.
  • Coordinate: when individuals adopt similar actions, behaviors, beliefs, or mental frameworks by learning through local interactions. Ordinary social conventions such as people greeting each other are good examples.
  • Cooperate: when individuals take actions, not necessarily in their self interest, that collectively produce a desirable outcome. Here, think of managing common pool resources (e.g. fishing boats not overfishing an area that they collectively control).
  • Arrange: when individuals manipulate items in a physical or virtual environment for their own purposes resulting in an organization of that environment. As an example, imagine a student co-op which keeps twenty types of hot sauce in its pantry. If each student puts whichever hot sauce she uses in the front of the pantry, then on average, the hot sauces will be arranged according to popularity, with the most favored hot sauces in the front and the least favored lost in the back.
  • Respond: when individuals react to external or internal stimuli creating collective responses that maintains system level functioning. For example, when yellow jackets attack a predator to maintain the colony, they are displaying this kind of problem solving.
  • Emerge: when individual parts create a whole that has categorically distinct and new functionalities. The most obvious example of this is the human brain….(More)”

Can An Online Game Help Create A Better Test For TB?


Esther Landhuis at NPR: “Though it’s the world’s top infectious killer, tuberculosis is surprisingly tricky to diagnose. Scientists think that video gamers can help them create a better diagnostic test.

An online puzzle released Monday will see whether the researchers are right. Players of a Web-based game called EteRNA will try to design a sensor molecule that could potentially make diagnosing TB as easy as taking a home pregnancy test. The TB puzzle marks the launch of “EteRNA Medicine.”

The idea of rallying gamers to fight TB arose as two young Stanford University professors chatted over dinner at a conference last May. Rhiju Das, a biochemist who helped create EteRNA, told bioinformatician Purvesh Khatri about the game, which challenges nonexperts to design RNA molecules that fold into target shapes.

RNA molecules play key roles in biology and disease. Some brain disorders can be traced to problems with RNA folding. Viruses such as H1N1 flu and HIV depend on RNA elements to replicate and infect cells.

Das wants to “fight fire with fire” — that is, to disrupt the RNA involved in a disease or virus by crafting new tools that are themselves made of RNA molecules. EteRNA players learn RNA design principles with each puzzle they solve.

Khatri was intrigued by the notion of engaging the public to solve problems. His lab develops novel diagnostics using publicly available data sets. The team had just published a paper on a set of genes that could help diagnose sepsis and had other papers under review on influenza and TB.

In an “Aha!” moment during their dinner chat, Khatri says, he and Das realized “how awesome it would be to sequentially merge our two approaches — to use public data to find a diagnostic marker for a disease, and then use the public’s help to develop the test.”

TB seemed opportune as it has a simple diagnostic signature — a set of three human genes that turn up or down predictably after TB infection. When checked across gene data on thousands of blood samples from 14 groups of people around the globe, the behavior of the three-gene set readily identified people with active TB, distinguishing them from individuals who had latent TB or other diseases.

Those findings, published in February, have gotten serious attention — not only from curious patients and doctors but also from humanitarian groups eager to help bring a better TB test to market. It can currently take several tests to tell whether a person has active TB, including a chest X-ray and sputum test. The Bill & Melinda Gates Foundation has started sending data to help the Stanford team validate a test based on the newly identified TB gene signature, says study leader Khatri, who works at the university’s Center for Biomedical Informatics Research….(More)”

Beyond the Digital Divide: Towards a Situated Approach to Open Data


Paper by Bezuidenhout, L, Rappert, B, Kelly, A and Leonelli, S: “Poor provision of information and communication technologies in low/middle-income countries represents a concern for promoting Open Data. This is often framed as a “digital divide” and addressed through initiatives that increase the availability of information and communication technologies to researchers based in low-resourced environments, as well as the amount of resources freely accessible online, including data themselves. Using empirical data from a qualitative study of lab-based research in Africa we highlight the limitations of such framing and emphasize the range of additional factors necessary to effectively utilize data available online. We adopt the ‘Capabilities Approach’ proposed by Sen to highlight the distinction between simply making resources available, and doing so while fostering researchers’ ability to use them. This provides an alternative orientation that highlights the persistence of deep inequalities within the seemingly egalitarian-inspired Open Data landscape. The extent and manner of future data sharing, we propose, will hinge on the ability to respond to the heterogeneity of research environments…(More)

Citizens breaking out of filter bubbles: Urban screens as civic media


Conference Paper by Satchell, Christine et al :”Social media platforms risk polarising public opinions by employing proprietary algorithms that produce filter bubbles and echo chambers. As a result, the ability of citizens and communities to engage in robust debate in the public sphere is diminished. In response, this paper highlights the capacity of urban interfaces, such as pervasive displays, to counteract this trend by exposing citizens to the socio-cultural diversity of the city. Engagement with different ideas, networks and communities is crucial to both innovation and the functioning of democracy. We discuss examples of urban interfaces designed to play a key role in fostering this engagement. Based on an analysis of works empirically-grounded in field observations and design research, we call for a theoretical framework that positions pervasive displays and other urban interfaces as civic media. We argue that when designed for more than wayfinding, advertisement or television broadcasts, urban screens as civic media can rectify some of the pitfalls of social media by allowing the polarised user to break out of their filter bubble and embrace the cultural diversity and richness of the city….(More)”

A Political Economy Framework for the Urban Data Revolution


Research Report by Ben Edwards, Solomon Greene and G. Thomas Kingsley: “With cities growing rapidly throughout much of the developing world, the global development community increasingly recognizes the need to build the capacities of local leaders to analyze and apply data to improve urban policymaking and service delivery. Civil society leaders, development advocates, and local governments are calling for an “urban data revolution” to accompany the new UN Sustainable Development Goals (SDGs), a revolution that would provide city leaders new tools and resources for data-driven governance. The need for improved data and analytic capacity in rapidly growing cities is clear, as is the exponential increase in the volume and types of data available for policymaking. However, the institutional arrangements that will allow city leaders to use data effectively remain incompletely theorized and poorly articulated.

This paper begins to fill that gap with a political economy framework that introduces three new concepts: permission, incentive, and institutionalization. We argue that without addressing the permission constraints and competing incentives that local government officials face in using data, investments in improved data collection at the local level will fail to achieve smarter urban policies. Granting permission and aligning incentives are also necessary to institutionalize data-driven governance at the local level and create a culture of evidence-based decisionmaking that outlives individual political administrations. Lastly, we suggest how the SDGs could support a truly transformative urban data revolution in which city leaders are empowered and incentivized to use data to drive decisionmaking for sustainable development…(More)”

Crowdsourcing global governance: sustainable development goals, civil society, and the pursuit of democratic legitimacy


Paper by Joshua C. Gellers in International Environmental Agreements: Politics, Law and Economics: “To what extent can crowdsourcing help members of civil society overcome the democratic deficit in global environmental governance? In this paper, I evaluate the utility of crowdsourcing as a tool for participatory agenda-setting in the realm of post-2015 sustainable development policy. In particular, I analyze the descriptive representativeness (e.g., the degree to which participation mirrors the demographic attributes of non-state actors comprising global civil society) of participants in two United Nations orchestrated crowdsourcing processes—the MY World survey and e-discussions regarding environmental sustainability. I find that there exists a perceptible demographic imbalance among contributors to the MY World survey and considerable dissonance between the characteristics of participants in the e-discussions and those whose voices were included in the resulting summary report. The results suggest that although crowdsourcing may present an attractive technological approach to expand participation in global governance, ultimately the representativeness of that participation and the legitimacy of policy outputs depend on the manner in which contributions are solicited and filtered by international institutions….(More)”