Mapping and Comparing Responsible Data Approaches


New report by Jos Berens, Ulrich Mans and Stefaan Verhulst: “Recent years have witnessed something of a sea-change in the way humanitarian organizations consider and use data. Growing awareness of the potential of data has led to new enthusiasm and new, innovative applications that seek to respond to and mitigate crises in fresh ways. At the same time, it has become apparent that the potential benefits are accompanied by risks. A new framework is needed that can help balance the benefits and risks, and that can aid humanitarian organizations and others (e.g., policymakers) develop a more responsible approach to data collection and use in their efforts to combat natural and man-made crises around the world. …

Screen Shot 2016-07-06 at 9.31.58 AMThe report we are releasing today, “Mapping and Comparing Responsible Data Approaches”, attempts to guide the first steps toward such a framework by learning from current approaches and principles. It is the outcome of a joint research project commissioned by UNOCHA and conducted in collaboration between the GovLab at NYU and Leiden University. In an effort to better understand the landscape, we have considered existing data use policies and principles from 17 organizations. These include 7 UN agencies, 7 International Organizations, 2 government agencies and 1 research institute. Our study of these organizations’ policies allowed us to extract a number of key takeaways that, together, amount to something like a roadmap for responsible data use for any humanitarian organization considering using data in new ways.

We began our research by closely mapping the existing responsible data use policies. To do this, we developed a template with eight broad themes that determines the key ingredients of responsible data framework. This use of a consistent template across organizations permits us to study and compare the 17 data use policies in a structured and systematic manner. Based on this template, we were able to extract 7 key takeaways for what works best when using data in a humanitarian context – presented in the conclusion to the paper being released today. They are designed to be broad enough to be broadly applicable, yet specific enough to be operational and actually usable….(More)”

Due Diligence? We need an app for that


Ken Banks at kiwanja.net: “The ubiquity of mobile phones, the reach of the Internet, the shear number of problems facing the planet, competitions and challenges galore, pots of money and strong media interest in tech-for-good projects has today created the perfect storm. Not a day goes by without the release of an app hoping to solve something, and the fact so many people are building so many apps to fix so many problems can only be a good thing. Right?

The only problem is this. It’s become impossible to tell good from bad, even real from fake. It’s something of a Wild West out there. So it was no surprise to see this happening recently. Quoting The Guardian:

An app which purported to offer aid to refugees lost in the Mediterranean has been pulled from Apple’s App Store after it was revealed as a fake. The I Sea app, which also won a Bronze medal at the Cannes Lions conference on Monday night, presented itself as a tool to help report refugees lost at sea, using real-time satellite footage to identify boats in trouble and highlighting their location to the Malta-based Migrant Offshore Aid Station (Moas), which would provide help.

In fact, the app did nothing of the sort. Rather than presenting real-time satellite footage – a difficult and expensive task – it instead simply shows a portion of a static, unchanging image. And while it claims to show the weather in the southern Mediterranean, that too isn’t that accurate: it’s for Western Libya.

The worry isn’t only that someone would decide to build a fake app which ‘tackles’ such an emotive subject, but the fact that this particular app won an award and received favourable press. Wired, Mashable, the Evening Standard and Reuters all spoke positively about it. Did no-one check that it did what it said it did?

This whole episode reminds me of something Joel Selanikio wrote in his contributing chapter to two books I’ve recently edited and published. In his chapters, which touch on his work on the Magpi data collection tool in addition to some of the challenges facing the tech-for-development community, Joel wrote:

In going over our user activity logs for the online Magpi app, I quickly realised that no-one from any of our funding organisations was listed. Apparently no-one who was paying us had ever seen our working software! This didn’t seem to make sense. Who would pay for software without ever looking at it? And if our funders hadn’t seen the software, what information were they using when they decided whether to fund us each year?

…The shear number of apps available that claim to solve all manner of problems may seem encouraging on the surface – 1,500 (and counting) to help refugees might be a case in point – but how many are useful? How many are being used? How many solve a problem? And how many are real?

Due diligence? Maybe it’s time we had an app for that…(More)”

What determines social behavior? Investigating the role of emotions, self-centered motives, and social norms.


Special issue of Frontiers in Human Neuroscience edited by Corrado Corradi-Dell’AcquaLeonie KobanSusanne Leiberg and Patrik Vuilleumier: “In the last decade, a growing research effort in behavioral sciences, especially psychology and neuroscience, has been invested in the study of the cognitive, biological, and evolutionary foundations of social behavior. Differently from the case of sociology, which studies social behavior also at the group level in terms of organizations and structures, psychology and neuroscience often define “social” as a feature of the individual brain that allows an efficient interaction with conspecifics, and thus constitutes a possible evolutionary advantage (Matusall, 2013). …

In the last decades, psychologist and neuroscientists invested a considerable amount of research to investigate the ability to act “socially”, which is considered an evolutionary advantage of many species (Matusall, 2013). The present Research Topic is a collection of a large number (38) of original contributions from an interdisciplinary community which together highlight that determinants of individual social behavior should be best understood along at least two different dimensions. This general perspective represents the backbone for a comprehensive and articulated model of how people and their brains interact with each other in social contexts. However, despite its appeal, it remains unclear how the model put forward in this editorial relates to particular paradigms with high ecological value, where it is more difficult to neatly disentangle the relative contribution of personal/environmental or stable/transient determinants. This is for instance the case of Preston et al. (2013) who investigated hospitalized terminal patients, measuring the emotional reactions elicited in observers and whether they were related to the frequency with which aid was delivered. In this perspective, a great challenge for future research in social psychology and neuroscience will indeed be to develop more accurate predictive models of social behavior and to make them applicable to ecologically valid settings….(More)”

Better research through video games


Simon Parkin at the New Yorker:”… it occurred to Szantner and Revaz that the tremendous amount of time and energy that people put into games could be co-opted in the name of human progress. That year, they founded Massively Multiplayer Online Science, a company that pairs game makers with scientists.

This past March, the first fruits of their conversation in Geneva appeared in EVE Online, a complex science-fiction game set in a galaxy composed of tens of thousands of stars and planets, and inhabited by half a million or so people from across the Internet, who explore and do battle daily. EVE was launched in 2003 by C.C.P., a studio based in Reykjavík, but players have only recently begun to contribute to scientific research. Their task is to assist with the Human Protein Atlas (H.P.A.), a Swedish-run effort to catalogue proteins and the genes that encode them, in both normal tissue and cancerous tumors. “Humans are, by evolution, very good at quickly recognizing patterns,” Emma Lundberg, the director of the H.P.A.’s Subcellular Atlas, a database of high-resolution images of fluorescently dyed cells, told me. “This is what we exploit in the game.”

The work, dubbed Project Discovery, fits snugly into EVE Online’s universe. At any point, players can take a break from their dogfighting, trading, and political machinations to play a simple game within the game, finding commonalities and differences between some thirteen million microscope images. In each one, the cell’s innards have been color-coded—blue for the nucleus (the cell’s brain), red for microtubules (the cell’s scaffolding), and green for anywhere that a protein has been detected. After completing a tutorial, players tag the image using a list of twenty-nine options, including “nucleus,” “cytoplasm,” and “mitochondria.” When enough players reach a consensus on a single image, it is marked as “solved” and handed off to the scientists at the H.P.A. “In terms of the pattern recognition and classification, it resembles what we are doing as researchers,” Lundberg said. “But the game interface is, of course, much cooler than our laboratory information-management system. I would love to work in-game only.”

Rather than presenting the project as a worthy extracurricular activity, EVE Online’s designers have cast it as an extension of the game’s broader fiction. Players work for the Sisters of EVE, a religious humanitarian-aid organization, which rewards their efforts with virtual currency. This can be used to purchase items in the game, including a unique set of armor designed by one of the C.C.P.’s artists, Andrei Cristea. (The armor is available only to players who participate in Project Discovery, and therefore, like a rare Coco Chanel frock, is desirable as much for its scarcity as for its design.) Insuring that the mini-game be thought of as more than a short-term novelty or diversion was an issue that Linzi Campbell, Project Discovery’s lead designer, considered carefully. “The hardest challenge has been turning the image-analysis process into a game that is strong enough to motivate the player to continue playing,” Campbell told me. “The fun comes from the feeling of mastery.”

Evidently, her efforts were successful. On the game’s first day of release, there were four hundred thousand submissions from players. According to C.C.P., some people have been so caught up in the task that they have played for fifteen hours without interruption. “EVE players turned out to be a perfect crowd for this type of citizen science,” Lundberg said. She anticipates that the first phase of the project will be completed this summer. If the work meets this target, players will be presented with more advanced images and tasks, such as the classification of protein patterns in complex tumor-tissue samples. Eventually, their efforts could aid in the development of new cancer drugs….(More)”

Nudging for Success


Press Release: “A groundbreaking report published today by ideas42 reveals several innovations that college administrators and policymakers can leverage to significantly improve college graduation rates at a time where completion is more out of reach than ever for millions of students.

The student path through college to graduation day is strewn with subtle, often invisible barriers that, over time, hinder students’ progress and cause some of them to drop out entirely. In Nudging for Success: Using Behavioral Science to Improve the Postsecondary Student Journey, ideas42 focuses on simple, low-cost ways to combat these unintentional obstacles and support student persistence and success at every stage in the college experience, from pre-admission to post-graduation. Teams worked with students, faculty and administrators at colleges around the country.

Even for students whose tuition is covered by financial aid, whose academic preparation is exemplary, and who are able to commit themselves full-time to their education, the subtle logistical and psychological sticking points can have a huge impact on their ability to persist and fully reap the benefits of a higher education.

Less than 60% of full-time students graduate from four-year colleges within six years, and less than 30% graduate from community colleges within three years. There are a myriad of factors often cited as deterrents to finishing school, such as the cost of tuition or the need to juggle family and work obligations, but behavioral science and the results of this report demonstrate that lesser-known dynamics like self-perception are also at play.

From increasing financial aid filing to fostering positive friend groups and a sense of belonging on campus, the 16 behavioral solutions outlined in Nudging for Success represent the potential for significant impact on the student experience and persistence. At Arizona State University, sending behaviorally-designed email reminders to students and parents about the Free Application for Federal Student Aid (FAFSA) priority deadline increased submissions by 72% and led to an increase in grant awards. Freshman retention among low-income, first generation, under-represented or other students most at risk of dropping out increased by 10% at San Francisco State University with the use of a testimonial video, self-affirming exercises, and monthly messaging aimed at first-time students.

“This evidence demonstrates how behavioral science can be the key to uplifting millions of Americans through education,” said Alissa Fishbane, Managing Director at ideas42. “By approaching the completion crisis from the whole experience of students themselves, administrators and policymakers have the opportunity to reduce the number of students who start, but do not finish, college—students who take on the financial burden of tuition but miss out on the substantial benefits of earning a degree.”

The results of this work drive home the importance of examining the college experience from the student perspective and through the lens of human behavior. College administrators and policymakers can replicate these gains at institutions across the country to make it simpler for students to complete the degree they started in ways that are often easier and less expensive to implement than existing alternatives—paving the way to stronger economic futures for millions of Americans….(More)”

IRS Unleashes Flood of Searchable Charity Data


Peter Olsen-Phillips in the Chronicle of Philanthropy: “The Internal Revenue Service opened a gusher of information on nonprofits Wednesday by making electronically filed Form 990s available in bulk and in a machine-friendly format.

The material will be available through the Public Data Sets area of Amazon Web Services. It will also include information from digital versions of the 990-EZ form filed by smaller nonprofits and form 990-PFs filed by private foundations.

The change means the public will have quicker and more in-depth access to the 990, the primary disclosure document for and main source of information on tax-exempt organizations. The form includes data on groups’ finances, board members, executive pay, fundraising expenses, and other aspects of their operations.

The filings were previously made public as PDF documents, requiring costly manual entry or imprecise character-recognition technology to extract the data in bulk and make it searchable. Now the information can be downloaded and parsed for free by anyone with a computer.

“With e-file data, you can easily and precisely extract individual items on the form,” Carl Malamud, an open-government advocate and the president of Public.Resource.org, wrote in an email. The nonprofit works to make public information more accessible, and Mr. Malamud has been at the forefront of efforts to liberate data on the nonprofit sector.

…Despite the enhanced transparency, much nonprofit data will remain hard to find and laborious to analyze, as nearly a third of all 990s were filed on paper in 2015.

Mr. McLean said it’s mainly smaller nonprofits that opt to file a paper 990 with the IRS, adding that he hopes recent legislative efforts to require mandatory electronic filing gain traction….(More)”.

Democracy in Decline: Rebuilding its Future


Book by Philip Kotler: “An examination by the ‘father of modern marketing’ into how well  a long cherished product (democracy) is satisfying the needs of its consumers (citizens), bringing conversation and solutions on how we can all do our bit to bring about positive change.

At a time where voting systems are flawed, fewer vote, major corporations fund campaigns and political parties battle it out, democracies are being seriously challenged and with that the prospects of a better world for all.

Philip Kotler identifies 14 shortcomings of today’s democracy and proposes potential remedies whilst encouraging readers to join the conversation, exercise their free speech and get on top of the issues that affect their lives regardless of nationality or political persuasion.

An accompanying website (www.democracyindecline.com) invites those interested to help find and publish thoughtful articles that aid our understanding of what is happening and what can be done to improve democracies around the world….(More)”

Selected Readings on Data Collaboratives


By Neil Britto, David Sangokoya, Iryna Susha, Stefaan Verhulst and Andrew Young

The Living Library’s Selected Readings series seeks to build a knowledge base on innovative approaches for improving the effectiveness and legitimacy of governance. This curated and annotated collection of recommended works on the topic of data collaboratives was originally published in 2017.

The term data collaborative refers to a new form of collaboration, beyond the public-private partnership model, in which participants from different sectors (including private companies, research institutions, and government agencies ) can exchange data to help solve public problems. Several of society’s greatest challenges — from addressing climate change to public health to job creation to improving the lives of children — require greater access to data, more collaboration between public – and private-sector entities, and an increased ability to analyze datasets. In the coming months and years, data collaboratives will be essential vehicles for harnessing the vast stores of privately held data toward the public good.

Selected Reading List (in alphabetical order)

Annotated Selected Readings List (in alphabetical order)

Agaba, G., Akindès, F., Bengtsson, L., Cowls, J., Ganesh, M., Hoffman, N., . . . Meissner, F. “Big Data and Positive Social Change in the Developing World: A White Paper for Practitioners and Researchers.” 2014. http://bit.ly/25RRC6N.

  • This white paper, produced by “a group of activists, researchers and data experts” explores the potential of big data to improve development outcomes and spur positive social change in low- and middle-income countries. Using examples, the authors discuss four areas in which the use of big data can impact development efforts:
    • Advocating and facilitating by “opening[ing] up new public spaces for discussion and awareness building;
    • Describing and predicting through the detection of “new correlations and the surfac[ing] of new questions;
    • Facilitating information exchange through “multiple feedback loops which feed into both research and action,” and
    • Promoting accountability and transparency, especially as a byproduct of crowdsourcing efforts aimed at “aggregat[ing] and analyz[ing] information in real time.
  • The authors argue that in order to maximize the potential of big data’s use in development, “there is a case to be made for building a data commons for private/public data, and for setting up new and more appropriate ethical guidelines.”
  • They also identify a number of challenges, especially when leveraging data made accessible from a number of sources, including private sector entities, such as:
    • Lack of general data literacy;
    • Lack of open learning environments and repositories;
    • Lack of resources, capacity and access;
    • Challenges of sensitivity and risk perception with regard to using data;
    • Storage and computing capacity; and
    • Externally validating data sources for comparison and verification.

Ansell, C. and Gash, A. “Collaborative Governance in Theory and Practice.” Journal of Public Administration Research and  Theory 18 (4), 2008. http://bit.ly/1RZgsI5.

  • This article describes collaborative arrangements that include public and private organizations working together and proposes a model for understanding an emergent form of public-private interaction informed by 137 diverse cases of collaborative governance.
  • The article suggests factors significant to successful partnering processes and outcomes include:
    • Shared understanding of challenges,
    • Trust building processes,
    • The importance of recognizing seemingly modest progress, and
    • Strong indicators of commitment to the partnership’s aspirations and process.
  • The authors provide a ‘’contingency theory model’’ that specifies relationships between different variables that influence outcomes of collaborative governance initiatives. Three “core contingencies’’ for successful collaborative governance initiatives identified by the authors are:
    • Time (e.g., decision making time afforded to the collaboration);
    • Interdependence (e.g., a high degree of interdependence can mitigate negative effects of low trust); and
    • Trust (e.g. a higher level of trust indicates a higher probability of success).

Ballivian A, Hoffman W. “Public-Private Partnerships for Data: Issues Paper for Data Revolution Consultation.” World Bank, 2015. Available from: http://bit.ly/1ENvmRJ

  • This World Bank report provides a background document on forming public-prviate partnerships for data with the private sector in order to inform the UN’s Independent Expert Advisory Group (IEAG) on sustaining a “data revolution” in sustainable development.
  • The report highlights the critical position of private companies within the data value chain and reflects on key elements of a sustainable data PPP: “common objectives across all impacted stakeholders, alignment of incentives, and sharing of risks.” In addition, the report describes the risks and incentives of public and private actors, and the principles needed to “build[ing] the legal, cultural, technological and economic infrastructures to enable the balancing of competing interests.” These principles include understanding; experimentation; adaptability; balance; persuasion and compulsion; risk management; and governance.
  • Examples of data collaboratives cited in the report include HP Earth Insights, Orange Data for Development Challenges, Amazon Web Services, IBM Smart Cities Initiative, and the Governance Lab’s Open Data 500.

Brack, Matthew, and Tito Castillo. “Data Sharing for Public Health: Key Lessons from Other Sectors.” Chatham House, Centre on Global Health Security. April 2015. Available from: http://bit.ly/1DHFGVl

  • The Chatham House report provides an overview on public health surveillance data sharing, highlighting the benefits and challenges of shared health data and the complexity in adapting technical solutions from other sectors for public health.
  • The report describes data sharing processes from several perspectives, including in-depth case studies of actual data sharing in practice at the individual, organizational and sector levels. Among the key lessons for public health data sharing, the report strongly highlights the need to harness momentum for action and maintain collaborative engagement: “Successful data sharing communities are highly collaborative. Collaboration holds the key to producing and abiding by community standards, and building and maintaining productive networks, and is by definition the essence of data sharing itself. Time should be invested in establishing and sustaining collaboration with all stakeholders concerned with public health surveillance data sharing.”
  • Examples of data collaboratives include H3Africa (a collaboration between NIH and Wellcome Trust) and NHS England’s care.data programme.

de Montjoye, Yves-Alexandre, Jake Kendall, and Cameron F. Kerry. “Enabling Humanitarian Use of Mobile Phone Data.” The Brookings Institution, Issues in Technology Innovation. November 2014. Available from: http://brook.gs/1JxVpxp

  • Using Ebola as a case study, the authors describe the value of using private telecom data for uncovering “valuable insights into understanding the spread of infectious diseases as well as strategies into micro-target outreach and driving update of health-seeking behavior.”
  • The authors highlight the absence of a common legal and standards framework for “sharing mobile phone data in privacy-conscientious ways” and recommend “engaging companies, NGOs, researchers, privacy experts, and governments to agree on a set of best practices for new privacy-conscientious metadata sharing models.”

Eckartz, Silja M., Hofman, Wout J., Van Veenstra, Anne Fleur. “A decision model for data sharing.” Vol. 8653 LNCS. Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics). 2014. http://bit.ly/21cGWfw.

  • This paper proposes a decision model for data sharing of public and private data based on literature review and three case studies in the logistics sector.
  • The authors identify five categories of the barriers to data sharing and offer a decision model for identifying potential interventions to overcome each barrier:
    • Ownership. Possible interventions likely require improving trust among those who own the data through, for example, involvement and support from higher management
    • Privacy. Interventions include “anonymization by filtering of sensitive information and aggregation of data,” and access control mechanisms built around identity management and regulated access.  
    • Economic. Interventions include a model where data is shared only with a few trusted organizations, and yield management mechanisms to ensure negative financial consequences are avoided.
    • Data quality. Interventions include identifying additional data sources that could improve the completeness of datasets, and efforts to improve metadata.
    • Technical. Interventions include making data available in structured formats and publishing data according to widely agreed upon data standards.

Hoffman, Sharona and Podgurski, Andy. “The Use and Misuse of Biomedical Data: Is Bigger Really Better?” American Journal of Law & Medicine 497, 2013. http://bit.ly/1syMS7J.

  • This journal articles explores the benefits and, in particular, the risks related to large-scale biomedical databases bringing together health information from a diversity of sources across sectors. Some data collaboratives examined in the piece include:
    • MedMining – a company that extracts EHR data, de-identifies it, and offers it to researchers. The data sets that MedMining delivers to its customers include ‘lab results, vital signs, medications, procedures, diagnoses, lifestyle data, and detailed costs’ from inpatient and outpatient facilities.
    • Explorys has formed a large healthcare database derived from financial, administrative, and medical records. It has partnered with major healthcare organizations such as the Cleveland Clinic Foundation and Summa Health System to aggregate and standardize health information from ten million patients and over thirty billion clinical events.
  • Hoffman and Podgurski note that biomedical databases populated have many potential uses, with those likely to benefit including: “researchers, regulators, public health officials, commercial entities, lawyers,” as well as “healthcare providers who conduct quality assessment and improvement activities,” regulatory monitoring entities like the FDA, and “litigants in tort cases to develop evidence concerning causation and harm.”
  • They argue, however, that risks arise based on:
    • The data contained in biomedical databases is surprisingly likely to be incorrect or incomplete;
    • Systemic biases, arising from both the nature of the data and the preconceptions of investigators are serious threats the validity of research results, especially in answering causal questions;
  • Data mining of biomedical databases makes it easier for individuals with political, social, or economic agendas to generate ostensibly scientific but misleading research findings for the purpose of manipulating public opinion and swaying policymakers.

Krumholz, Harlan M., et al. “Sea Change in Open Science and Data Sharing Leadership by Industry.” Circulation: Cardiovascular Quality and Outcomes 7.4. 2014. 499-504. http://1.usa.gov/1J6q7KJ

  • This article provides a comprehensive overview of industry-led efforts and cross-sector collaborations in data sharing by pharmaceutical companies to inform clinical practice.
  • The article details the types of data being shared and the early activities of GlaxoSmithKline (“in coordination with other companies such as Roche and ViiV”); Medtronic and the Yale University Open Data Access Project; and Janssen Pharmaceuticals (Johnson & Johnson). The article also describes the range of involvement in data sharing among pharmaceutical companies including Pfizer, Novartis, Bayer, AbbVie, Eli Llly, AstraZeneca, and Bristol-Myers Squibb.

Mann, Gideon. “Private Data and the Public Good.” Medium. May 17, 2016. http://bit.ly/1OgOY68.

    • This Medium post from Gideon Mann, the Head of Data Science at Bloomberg, shares his prepared remarks given at a lecture at the City College of New York. Mann argues for the potential benefits of increasing access to private sector data, both to improve research and academic inquiry and also to help solve practical, real-world problems. He also describes a number of initiatives underway at Bloomberg along these lines.    
  • Mann argues that data generated at private companies “could enable amazing discoveries and research,” but is often inaccessible to those who could put it to those uses. Beyond research, he notes that corporate data could, for instance, benefit:
      • Public health – including suicide prevention, addiction counseling and mental health monitoring.
    • Legal and ethical questions – especially as they relate to “the role algorithms have in decisions about our lives,” such as credit checks and resume screening.
  • Mann recognizes the privacy challenges inherent in private sector data sharing, but argues that it is a common misconception that the only two choices are “complete privacy or complete disclosure.” He believes that flexible frameworks for differential privacy could open up new opportunities for responsibly leveraging data collaboratives.

Pastor Escuredo, D., Morales-Guzmán, A. et al, “Flooding through the Lens of Mobile Phone Activity.” IEEE Global Humanitarian Technology Conference, GHTC 2014. Available from: http://bit.ly/1OzK2bK

  • This report describes the impact of using mobile data in order to understand the impact of disasters and improve disaster management. The report was conducted in the Mexican state of Tabasco in 2009 as a multidisciplinary, multi-stakeholder consortium involving the UN World Food Programme (WFP), Telefonica Research, Technical University of Madrid (UPM), Digital Strategy Coordination Office of the President of Mexico, and UN Global Pulse.
  • Telefonica Research, a division of the major Latin American telecommunications company, provided call detail records covering flood-affected areas for nine months. This data was combined with “remote sensing data (satellite images), rainfall data, census and civil protection data.” The results of the data demonstrated that “analysing mobile activity during floods could be used to potentially locate damaged areas, efficiently assess needs and allocate resources (for example, sending supplies to affected areas).”
  • In addition to the results, the study highlighted “the value of a public-private partnership on using mobile data to accurately indicate flooding impacts in Tabasco, thus improving early warning and crisis management.”

* Perkmann, M. and Schildt, H. “Open data partnerships between firms and universities: The role of boundary organizations.” Research Policy, 44(5), 2015. http://bit.ly/25RRJ2c

  • This paper discusses the concept of a “boundary organization” in relation to industry-academic partnerships driven by data. Boundary organizations perform mediated revealing, allowing firms to disclose their research problems to a broad audience of innovators and simultaneously minimize the risk that this information would be adversely used by competitors.
  • The authors identify two especially important challenges for private firms to enter open data or participate in data collaboratives with the academic research community that could be addressed through more involvement from boundary organizations:
    • First is a challenge of maintaining competitive advantage. The authors note that, “the more a firm attempts to align the efforts in an open data research programme with its R&D priorities, the more it will have to reveal about the problems it is addressing within its proprietary R&D.”
    • Second, involves the misalignment of incentives between the private and academic field. Perkmann and Schildt argue that, a firm seeking to build collaborations around its opened data “will have to provide suitable incentives that are aligned with academic scientists’ desire to be rewarded for their work within their respective communities.”

Robin, N., Klein, T., & Jütting, J. “Public-Private Partnerships for Statistics: Lessons Learned, Future Steps.” OECD. 2016. http://bit.ly/24FLYlD.

  • This working paper acknowledges the growing body of work on how different types of data (e.g, telecom data, social media, sensors and geospatial data, etc.) can address data gaps relevant to National Statistical Offices (NSOs).
  • Four models of public-private interaction for statistics are describe: in-house production of statistics by a data-provider for a national statistics office (NSO), transfer of data-sets to NSOs from private entities, transfer of data to a third party provider to manage the NSO and private entity data, and the outsourcing of NSO functions.
  • The paper highlights challenges to public-private partnerships involving data (e.g., technical challenges, data confidentiality, risks, limited incentives for participation), suggests deliberate and highly structured approaches to public-private partnerships involving data require enforceable contracts, emphasizes the trade-off between data specificity and accessibility of such data, and the importance of pricing mechanisms that reflect the capacity and capability of national statistic offices.
  • Case studies referenced in the paper include:
    • A mobile network operator’s (MNO Telefonica) in house analysis of call detail records;
    • A third-party data provider and steward of travel statistics (Positium);
    • The Data for Development (D4D) challenge organized by MNO Orange; and
    • Statistics Netherlands use of social media to predict consumer confidence.

Stuart, Elizabeth, Samman, Emma, Avis, William, Berliner, Tom. “The data revolution: finding the missing millions.” Overseas Development Institute, 2015. Available from: http://bit.ly/1bPKOjw

  • The authors of this report highlight the need for good quality, relevant, accessible and timely data for governments to extend services into underrepresented communities and implement policies towards a sustainable “data revolution.”
  • The solutions focused on this recent report from the Overseas Development Institute focus on capacity-building activities of national statistical offices (NSOs), alternative sources of data (including shared corporate data) to address gaps, and building strong data management systems.

Taylor, L., & Schroeder, R. “Is bigger better? The emergence of big data as a tool for international development policy.” GeoJournal, 80(4). 2015. 503-518. http://bit.ly/1RZgSy4.

  • This journal article describes how privately held data – namely “digital traces” of consumer activity – “are becoming seen by policymakers and researchers as a potential solution to the lack of reliable statistical data on lower-income countries.
  • They focus especially on three categories of data collaborative use cases:
    • Mobile data as a predictive tool for issues such as human mobility and economic activity;
    • Use of mobile data to inform humanitarian response to crises; and
    • Use of born-digital web data as a tool for predicting economic trends, and the implications these have for LMICs.
  • They note, however, that a number of challenges and drawbacks exist for these types of use cases, including:
    • Access to private data sources often must be negotiated or bought, “which potentially means substituting negotiations with corporations for those with national statistical offices;”
    • The meaning of such data is not always simple or stable, and local knowledge is needed to understand how people are using the technologies in question
    • Bias in proprietary data can be hard to understand and quantify;
    • Lack of privacy frameworks; and
    • Power asymmetries, wherein “LMIC citizens are unwittingly placed in a panopticon staffed by international researchers, with no way out and no legal recourse.”

van Panhuis, Willem G., Proma Paul, Claudia Emerson, John Grefenstette, Richard Wilder, Abraham J. Herbst, David Heymann, and Donald S. Burke. “A systematic review of barriers to data sharing in public health.” BMC public health 14, no. 1 (2014): 1144. Available from: http://bit.ly/1JOBruO

  • The authors of this report provide a “systematic literature of potential barriers to public health data sharing.” These twenty potential barriers are classified in six categories: “technical, motivational, economic, political, legal and ethical.” In this taxonomy, “the first three categories are deeply rooted in well-known challenges of health information systems for which structural solutions have yet to be found; the last three have solutions that lie in an international dialogue aimed at generating consensus on policies and instruments for data sharing.”
  • The authors suggest the need for a “systematic framework of barriers to data sharing in public health” in order to accelerate access and use of data for public good.

Verhulst, Stefaan and Sangokoya, David. “Mapping the Next Frontier of Open Data: Corporate Data Sharing.” In: Gasser, Urs and Zittrain, Jonathan and Faris, Robert and Heacock Jones, Rebekah, “Internet Monitor 2014: Reflections on the Digital World: Platforms, Policy, Privacy, and Public Discourse (December 15, 2014).” Berkman Center Research Publication No. 2014-17. http://bit.ly/1GC12a2

  • This essay describe a taxonomy of current corporate data sharing practices for public good: research partnerships; prizes and challenges; trusted intermediaries; application programming interfaces (APIs); intelligence products; and corporate data cooperatives or pooling.
  • Examples of data collaboratives include: Yelp Dataset Challenge, the Digital Ecologies Research Partnerhsip, BBVA Innova Challenge, Telecom Italia’s Big Data Challenge, NIH’s Accelerating Medicines Partnership and the White House’s Climate Data Partnerships.
  • The authors highlight important questions to consider towards a more comprehensive mapping of these activities.

Verhulst, Stefaan and Sangokoya, David, 2015. “Data Collaboratives: Exchanging Data to Improve People’s Lives.” Medium. Available from: http://bit.ly/1JOBDdy

  • The essay refers to data collaboratives as a new form of collaboration involving participants from different sectors exchanging data to help solve public problems. These forms of collaborations can improve people’s lives through data-driven decision-making; information exchange and coordination; and shared standards and frameworks for multi-actor, multi-sector participation.
  • The essay cites four activities that are critical to accelerating data collaboratives: documenting value and measuring impact; matching public demand and corporate supply of data in a trusted way; training and convening data providers and users; experimenting and scaling existing initiatives.
  • Examples of data collaboratives include NIH’s Precision Medicine Initiative; the Mobile Data, Environmental Extremes and Population (MDEEP) Project; and Twitter-MIT’s Laboratory for Social Machines.

Verhulst, Stefaan, Susha, Iryna, Kostura, Alexander. “Data Collaboratives: matching Supply of (Corporate) Data to Solve Public Problems.” Medium. February 24, 2016. http://bit.ly/1ZEp2Sr.

  • This piece articulates a set of key lessons learned during a session at the International Data Responsibility Conference focused on identifying emerging practices, opportunities and challenges confronting data collaboratives.
  • The authors list a number of privately held data sources that could create positive public impacts if made more accessible in a collaborative manner, including:
    • Data for early warning systems to help mitigate the effects of natural disasters;
    • Data to help understand human behavior as it relates to nutrition and livelihoods in developing countries;
    • Data to monitor compliance with weapons treaties;
    • Data to more accurately measure progress related to the UN Sustainable Development Goals.
  • To the end of identifying and expanding on emerging practice in the space, the authors describe a number of current data collaborative experiments, including:
    • Trusted Intermediaries: Statistics Netherlands partnered with Vodafone to analyze mobile call data records in order to better understand mobility patterns and inform urban planning.
    • Prizes and Challenges: Orange Telecom, which has been a leader in this type of Data Collaboration, provided several examples of the company’s initiatives, such as the use of call data records to track the spread of malaria as well as their experience with Challenge 4 Development.
    • Research partnerships: The Data for Climate Action project is an ongoing large-scale initiative incentivizing companies to share their data to help researchers answer particular scientific questions related to climate change and adaptation.
    • Sharing intelligence products: JPMorgan Chase shares macro economic insights they gained leveraging their data through the newly established JPMorgan Chase Institute.
  • In order to capitalize on the opportunities provided by data collaboratives, a number of needs were identified:
    • A responsible data framework;
    • Increased insight into different business models that may facilitate the sharing of data;
    • Capacity to tap into the potential value of data;
    • Transparent stock of available data supply; and
    • Mapping emerging practices and models of sharing.

Vogel, N., Theisen, C., Leidig, J. P., Scripps, J., Graham, D. H., & Wolffe, G. “Mining mobile datasets to enable the fine-grained stochastic simulation of Ebola diffusion.” Paper presented at the Procedia Computer Science. 2015. http://bit.ly/1TZDroF.

  • The paper presents a research study conducted on the basis of the mobile calls records shared with researchers in the framework of the Data for Development Challenge by the mobile operator Orange.
  • The study discusses the data analysis approach in relation to developing a situation of Ebola diffusion built around “the interactions of multi-scale models, including viral loads (at the cellular level), disease progression (at the individual person level), disease propagation (at the workplace and family level), societal changes in migration and travel movements (at the population level), and mitigating interventions (at the abstract government policy level).”
  • The authors argue that the use of their population, mobility, and simulation models provide more accurate simulation details in comparison to high-level analytical predictions and that the D4D mobile datasets provide high-resolution information useful for modeling developing regions and hard to reach locations.

Welle Donker, F., van Loenen, B., & Bregt, A. K. “Open Data and Beyond.” ISPRS International Journal of Geo-Information, 5(4). 2016. http://bit.ly/22YtugY.

  • This research has developed a monitoring framework to assess the effects of open (private) data using a case study of a Dutch energy network administrator Liander.
  • Focusing on the potential impacts of open private energy data – beyond ‘smart disclosure’ where citizens are given information only about their own energy usage – the authors identify three attainable strategic goals:
    • Continuously optimize performance on services, security of supply, and costs;
    • Improve management of energy flows and insight into energy consumption;
    • Help customers save energy and switch over to renewable energy sources.
  • The authors propose a seven-step framework for assessing the impacts of Liander data, in particular, and open private data more generally:
    • Develop a performance framework to describe what the program is about, description of the organization’s mission and strategic goals;
    • Identify the most important elements, or key performance areas which are most critical to understanding and assessing your program’s success;
    • Select the most appropriate performance measures;
    • Determine the gaps between what information you need and what is available;
    • Develop and implement a measurement strategy to address the gaps;
    • Develop a performance report which highlights what you have accomplished and what you have learned;
    • Learn from your experiences and refine your approach as required.
  • While the authors note that the true impacts of this open private data will likely not come into view in the short term, they argue that, “Liander has successfully demonstrated that private energy companies can release open data, and has successfully championed the other Dutch network administrators to follow suit.”

World Economic Forum, 2015. “Data-driven development: pathways for progress.” Geneva: World Economic Forum. http://bit.ly/1JOBS8u

  • This report captures an overview of the existing data deficit and the value and impact of big data for sustainable development.
  • The authors of the report focus on four main priorities towards a sustainable data revolution: commercial incentives and trusted agreements with public- and private-sector actors; the development of shared policy frameworks, legal protections and impact assessments; capacity building activities at the institutional, community, local and individual level; and lastly, recognizing individuals as both produces and consumers of data.

White House Challenges Artificial Intelligence Experts to Reduce Incarceration Rates


Jason Shueh at GovTech: “The U.S. spends $270 billion on incarceration each year, has a prison population of about 2.2 million and an incarceration rate that’s spiked 220 percent since the 1980s. But with the advent of data science, White House officials are asking experts for help.

On Tuesday, June 7, the White House Office of Science and Technology Policy’s Lynn Overmann, who also leads the White House Police Data Initiative, stressed the severity of the nation’s incarceration crisis while asking a crowd of data scientists and artificial intelligence specialists for aid.

“We have built a system that is too large, and too unfair and too costly — in every sense of the word — and we need to start to change it,” Overmann said, speaking at a Computing Community Consortium public workshop.

She argued that the U.S., a country that has the highest amount incarcerated citizens in the world, is in need of systematic reforms with both data tools to process alleged offenders and at the policy level to ensure fair and measured sentences. As a longtime counselor, advisor and analyst for the Justice Department and at the city and state levels, Overman said she has studied and witnessed an alarming number of issues in terms of bias and unwarranted punishments.

For instance, she said that statistically, while drug use is about equal between African-Americans and Caucasians, African-Americans are more likely to be arrested and convicted. They also receive longer prison sentences compared to Caucasian inmates convicted of the same crimes….

Data and digital tools can help curb such pitfalls by increasing efficiency, transparency and accountability, she said.

“We think these types of data exchanges [between officials and technologists] can actually be hugely impactful if we can figure out how to take this information and operationalize it for the folks who run these systems,” Obermann noted.

The opportunities to apply artificial intelligence and data analytics, she said, might include using it to improve questions on parole screenings, using it to analyze police body camera footage, and applying it to criminal justice data for legislators and policy workers….

If the private sector is any indication, artificial intelligence and machine learning techniques could be used to interpret this new and vast supply of law enforcement data. In an earlier presentation by Eric Horvitz, the managing director at Microsoft Research, Horvitz showcased how the company has applied artificial intelligence to vision and language to interpret live video content for the blind. The app, titled SeeingAI, can translate live video footage, captured from an iPhone or a pair of smart glasses, into instant audio messages for the seeing impaired. Twitter’s live-streaming app Periscope has employed similar technology to guide users to the right content….(More)”

Private Data and the Public Good


Gideon Mann‘s remarks on the occasion of the Robert Khan distinguished lecture at The City College of New York on 5/22/16: and opportunities about a specific aspect of this relationship, the broader need for computer science to engage with the real world. Right now, a key aspect of this relationship is being built around the risks and opportunities of the emerging role of data.

Ultimately, I believe that these relationships, between computer science andthe real world, between data science and real problems, hold the promise tovastly increase our public welfare. And today, we, the people in this room,have a unique opportunity to debate and define a more moral dataeconomy….

The hybrid research model proposes something different. The hybrid research model, embeds, as it were, researchers as practitioners.The thought was always that you would be going about your regular run of business,would face a need to innovate to solve a crucial problem, and would do something novel. At that point, you might choose to work some extra time and publish a paper explaining your innovation. In practice, this model rarely works as expected. Tight deadlines mean the innovation that people do in their normal progress of business is incremental..

This model separated research from scientific publication, and shortens thetime-window of research, to what can be realized in a few year time zone.For me, this always felt like a tremendous loss, with respect to the older so-called “ivory tower” research model. It didn’t seem at all clear how this kindof model would produce the sea change of thought engendered byShannon’s work, nor did it seem that Claude Shannon would ever want towork there. This kind of environment would never support the freestanding wonder, like the robot mouse that Shannon worked on. Moreover, I always believed that crucial to research is publication and participation in the scientific community. Without this engagement, it feels like something different — innovation perhaps.

It is clear that the monopolistic environment that enabled AT&T to support this ivory tower research doesn’t exist anymore. .

Now, the hybrid research model was one model of research at Google, butthere is another model as well, the moonshot model as exemplified byGoogle X. Google X brought together focused research teams to driveresearch and development around a particular project — Google Glass and the Self-driving car being two notable examples. Here the focus isn’t research, but building a new product, with research as potentially a crucial blocking issue. Since the goal of Google X is directly to develop a new product, by definition they don’t publish papers along the way, but they’re not as tied to short-term deliverables as the rest of Google is. However, they are again decidedly un-Bell-Labs like — a secretive, tightly focused, non-publishing group. DeepMind is a similarly constituted initiative — working, for example, on a best-in-the-world Go playing algorithm, with publications happening sparingly.

Unfortunately, both of these approaches, the hybrid research model and the moonshot model stack the deck towards a particular kind of research — research that leads to relatively short term products that generate corporate revenue. While this kind of research is good for society, it isn’t the only kind of research that we need. We urgently need research that is longterm, and that is undergone even without a clear financial local impact. Insome sense this is a “tragedy of the commons”, where a shared public good (the commons) is not supported because everyone can benefit from itwithout giving back. Academic research is thus a non-rival, non-excludible good, and thus reasonably will be underfunded. In certain cases, this takes on an ethical dimension — particularly in health care, where the choice ofwhat diseases to study and address has a tremendous potential to affect human life. Should we research heart disease or malaria? This decisionmakes a huge impact on global human health, but is vastly informed by the potential profit from each of these various medicines….

Private Data means research is out of reach

The larger point that I want to make, is that in the absence of places where long-term research can be done in industry, academia has a tremendous potential opportunity. Unfortunately, it is actually quite difficult to do the work that needs to be done in academia, since many of the resources needed to push the state of the art are only found in industry: in particular data.

Of course, academia also lacks machine resources, but this is a simpler problem to fix — it’s a matter of money, resources form the government could go to enabling research groups building their own data centers or acquiring the computational resources from the market, e.g. Amazon. This is aided by the compute philanthropy that Google and Microsoft practice that grant compute cycles to academic organizations.

But the data problem is much harder to address. The data being collected and generated at private companies could enable amazing discoveries and research, but is impossible for academics to access. The lack of access to private data from companies actually is much more significant effects than inhibiting research. In particular, the consumer level data, collected by social networks and internet companies could do much more than ad targeting.

Just for public health — suicide prevention, addiction counseling, mental health monitoring — there is enormous potential in the use of our online behavior to aid the most needy, and academia and non-profits are set-up to enable this work, while companies are not.

To give a one examples, anorexia and eating disorders are vicious killers. 20 million women and 10 million men suffer from a clinically significant eating disorder at some time in their life, and sufferers of eating disorders have the highest mortality rate of any other mental health disorder — with a jaw-dropping estimated mortality rate of 10%, both directly from injuries sustained by the disorder and by suicide resulting from the disorder.

Eating disorders are particular in that sufferers often seek out confirmatory information, blogs, images and pictures that glorify and validate what sufferers see as “lifestyle” choices. Browsing behavior that seeks out images and guidance on how to starve yourself is a key indicator that someone is suffering. Tumblr, pinterest, instagram are places that people host and seek out this information. Tumblr has tried to help address this severe mental health issue by banning blogs that advocate for self-harm and by adding PSA announcements to query term searches for queries for or related to anorexia. But clearly — this is not the be all and end all of work that could be done to detect and assist people at risk of dying from eating disorders. Moreover, this data could also help understand the nature of those disorders themselves…..

There is probably a role for a data ombudsman within private organizations — someone to protect the interests of the public’s data inside of an organization. Like a ‘public editor’ in a newspaper according to how you’ve set it up. There to protect and articulate the interests of the public, which means probably both sides — making sure a company’s data is used for public good where appropriate, and making sure the ‘right’ to privacy of the public is appropriately safeguarded (and probably making sure the public is informed when their data is compromised).

Next, we need a platform to make collaboration around social good between companies and between companies and academics. This platform would enable trusted users to have access to a wide variety of data, and speed process of research.

Finally, I wonder if there is a way that government could support research sabbaticals inside of companies. Clearly, the opportunities for this research far outstrip what is currently being done…(more)”