DataCollaboratives.org – A New Resource on Creating Public Value by Exchanging Data


Recent years have seen exponential growth in the amount of data being generated and stored around the world. There is increasing recognition that this data can play a key role in solving some of the most difficult public problems we face.

However, much of the potentially useful data is currently privately held and not available for public insights. Data in the form of web clicks, social “likes,” geo location and online purchases are typically tightly controlled, usually by entities in the private sector. Companies today generate an ever-growing stream of information from our proliferating sensors and devices. Increasingly, they—and various other actors—are asking if there is a way to make this data available for the public good. There is an ongoing search for new models of corporate responsibility in the digital era around data toward the creation of “data collaboratives”.

Screen Shot 2017-01-17 at 2.54.05 PM

Today, the GovLab is excited to launch a new resource for Data Collaboratives (datacollaboratives.org). Data Collaboratives are an emerging form of public-private partnership in which participants from different sectors — including private companies, research institutions, and government agencies — exchange data to help solve public problems.

The resource results from different partnerships with UNICEF (focused on creating data collaboratives to improve children’s lives) and Omidyar Network (studying new ways to match (open) data demand and supply to increase impact).

Natalia Adler, a data, research and policy planning specialist and the UNICEF Data Collaboratives Project Lead notes, “At UNICEF, we’re dealing with the world’s most complex problems affecting children. Data Collaboratives offer an exciting opportunity to tap on previously inaccessible datasets and mobilize a wide range of data expertise to advance child rights around the world. It’s all about connecting the dots.”

To better understand the potential of these Collaboratives, the GovLab collected information on dozens of examples from across the world. These many and diverse initiatives clearly suggest the potential of Data Collaboratives to improve people’s lives when done responsibly. As Stefaan Verhulst, co-founder of the GovLab, puts it: “In the coming months and years, Data Collaboratives will be essential vehicles for harnessing the vast stores of privately held data toward the public good.”

In particular, our research to date suggests that Data Collaboratives offer a number of potential benefits, including enhanced:

  • Situational Awareness and Response: For example, Orbital Insights and the World Bank are using satellite imagery to measure and track poverty. This technology can, in some instances, “be more accurate than U.S. census data.”
  • Public Service Design and Delivery: Global mapping company, Esri, and Waze’s Connected Citizen’s program are using crowdsourced traffic information to help governments design better transportation.
  • Impact Assessment and Evaluation: Nielsen and the World Food Program (WFP) have been using data collected via mobile phone surveys to better monitor food insecurity in order to advise the WFP’s resource allocations….(More)

Doctors take inspiration from online dating to build organ transplant AI


Ariel Bogle at Mashable :”When Bob Jones performed one of Victoria’s first liver transplants in 1988, he could not imagine that 29 years later he’d be talking about artificial intelligence and online dating. Jones is the director of Austin Health’s Victorian liver transplant unit in Melbourne, Australia, and along with his colleague Lawrence Lau, he has helped develop an algorithm that could potentially better match organ donors with organ recipients.

Comparing it to the metrics behind dating site eHarmony, Jone said they planned to use the specially-designed AI to improve the accuracy of matching liver donors and recipients, hopefully resulting in less graft failures and fewer patient deaths.

“It’s a specially designed machine learning algorithm using multiple donor and recipient features to predict the outcome,” he explained.

The team plugged around 25 characteristics of donors and recipients into their AI, using the data points to retrospectively predict what would happen to organ grafts.

“We used all the basic things like sex, age, underlying disease, blood type,” he said. “And then there are certain characteristics about the donor … and all the parameters that might indicate the liver might be upset.”

Using the AI to assess the retrospective results of 75 adult patients who’d had transplants, they found the method predicted graft failure 30 days post-transplant at an accuracy of 84 percent compared to 68 percent with current methods.

“It really meant for the first time we could assess an organ’s suitability in a quantitive way,” he added, “as opposed to the current method, which really comes down to the position of the doctor eyeballing all the data and making a call based on their experience.”

Improving the accuracy of organ donor matches is vital, because as Jones put it, “it’s an extraordinary, precious gift from one Australian to another.”…(More)”

Urban Exposures: How Cell Phone Data Helps Us Better Understand Human Exposure To Air Pollution


Senseable City Lab: “Global urbanization has led to one of the world’s most pressing environmental health concerns: the increasing number of people contributing to and being affected by air pollution, leading to 7 million early deaths each year. The key issue is human exposure to pollution within cities and the consequential effects on human health.

With new research conducted at MIT’s Senseable City Lab, human exposure to air pollution can now be accurately quantified at an unprecedented scale. Researchers mapped the movements of several million people using ubiquitous cell phone data, and intersected this information with neighborhood air pollution measures. Covering the expanse of New York City and its 8.5 million inhabitants, the study reveals where and when New Yorkers are most at risk of exposure to air pollution – with major implications for environment and public health policy… (More)”

Crowdsourcing Medical Data Through Gaming


Felix Morgan in The Austin Chronicle: “Video games have changed the way we play, but they also have the potential to change the way we research and solve problems, in fields such as health care and education. One game that’s made waves in medical research is Sea Hero Quest. This smartphone game has created a groundbreaking approach to data collection, leading to an earlier diagnosis of dementia. So far, 2.5 million people have played the game, providing scientists with years’ worth of data across borders and demographics.

By offering this game as a free mobile app, researchers are overcoming the ever-present problems of small sample sizes and time-consuming data gathering in empirical research. Sea Hero Quest was created by Glitchers, partnering with University College London, University of East Anglia, and Alzheimer’s Research. As players navigate mazes, shoot flares into baskets, and photograph sea creatures, they answer simple demographic questions and generate rich data sets.

“The idea of crowdsourced data-gathering games for research is a new and exciting method of obtaining data that would be prohibitively expensive otherwise,” says Paul Toprac, who along with his colleague Matt O’Hair, run the Simulation and Game Applications (SAGA) Lab at University of Texas Austin. Their team helps researchers across campus and in the private sector design, implement, and find funding for video game-based research.

O’Hair sees a lot of potential for Sea Hero Quest and other research-based games. “One of the greatest parts about the SAGA Lab is that we get to help researchers make strides in these kinds of fields,” he says.

The idea of using crowdsourcing for data collection is relatively new, but using gaming for research is something that has been well established. Last year at SXSW, Nolan Bushnell, the founder of Atari, made a statement that video games were the key to understanding and treating dementia and related issues, which certainly seems possible based on the preliminary results from Sea Hero Quest. “We have had about 35 years of research using games as a medium,” Toprac says. “However, only recently have we used games as a tool for explicit data gathering.”…(More)”

Fighting Ebola with information


Larissa Fast and Adele Waugaman at Global Innovation Exchange: What can be learned from the use of data, information, and digital technologies, such as mobile-based systems and internet connectivity, during the Ebola outbreak response in West Africa? What worked, what didn’t, and how can we apply these lessons to improve data and information flows in the future? This report details key findings and recommendations about the collection, management, analysis, and use of paper-based and digital data and information, drawing upon the insights of more than 130 individuals and organizations who worked tirelessly to end the Ebola outbreak in West Africa in 2014 and 2015….(More)”

Crowdsourcing, Citizen Science, and Data-sharing


Sapien Labs: “The future of human neuroscience lies in crowdsourcing, citizen science and data sharing but it is not without its minefields.

A recent Scientific American article by Daniel Goodwin, “Why Neuroscience Needs Hackers,makes the case that neuroscience, like many fields today, is drowning in data, begging for application of advances in computer science like machine learning. Neuroscientists are able to gather realms of neural data, but often without big data mechanisms and frameworks to synthesize them.

The SA article describes the work of Sebastian Seung, a Princeton neuroscientist, who recently mapped the neural connections of the human retina from an “overwhelming mass” of electron microscopy data using state of the art A.I. and massive crowd-sourcing. Seung incorporated the A.I. into a game called “Eyewire” where 1,000s of volunteers scored points while improving the neural map.   Although the article’s title emphasizes advanced A.I., Dr. Seung’s experiment points even more to crowdsourcing and open science, avenues for improving research that have suddenly become easy and powerful with today’s internet. Eyewire perhaps epitomizes successful crowdsourcing — using an application that gathers, represents, and analyzes data uniformly according to researchers’ needs.

Crowdsourcing is seductive in its potential but risky for those who aren’t sure how to control it to get what they want. For researchers who don’t want to become hackers themselves, trying to turn the diversity of data produced by a crowd into conclusive results might seem too much of a headache to make it worthwhile. This is probably why the SA article title says we need hackers. The crowd is there but using it depends on innovative software engineering. A lot of researchers could really use software designed to flexibly support a diversity of crowdsourcing, some AI to enable things like crowd validation and big data tools.

The Potential

The SA article also points to Open BCI (brain-computer interface), mentioned here in other posts, as an example of how traditional divisions between institutional and amateur (or “citizen”) science are now crumbling; Open BCI is a community of professional and citizen scientists doing principled research with cheap, portable EEG-headsets producing professional research quality data. In communities of “neuro-hackers,” like NeurotechX, professional researchers, entrepreneurs, and citizen scientists are coming together to develop all kinds of applications, such as “telepathic” machine control, prostheses, and art. Other companies, like Neurosky sell EEG headsets and biosensors for bio-/neuro-feedback training and health-monitoring at consumer affordable pricing. (Read more in Citizen Science and EEG)

Tan Le, whose company Emotiv Lifesciences, also produces portable EEG head-sets, says, in an article in National Geographic, that neuroscience needs “as much data as possible on as many brains as possible” to advance diagnosis of conditions such as epilepsy and Alzheimer’s. Human neuroscience studies have typically consisted of 20 to 50 participants, an incredibly small sampling of a 7 billion strong humanity. For a single lab to collect larger datasets is difficult but with diverse populations across the planet real understanding may require data not even from thousands of brains but millions. With cheap mobile EEG-headsets, open-source software, and online collaboration, the potential for anyone can participate in such data collection is immense; the potential for crowdsourcing unprecedented. There are, however, significant hurdles to overcome….(More)”

Notable Privacy and Security Books from 2016


Daniel J. Solove at Technology, Academics, Policy: “Here are some notable books on privacy and security from 2016….

Chris Jay Hoofnagle, Federal Trade Commission Privacy Law and Policy

From my blurb: “Chris Hoofnagle has written the definitive book about the FTC’s involvement in privacy and security. This is a deep, thorough, erudite, clear, and insightful work – one of the very best books on privacy and security.”

My interview with Hoofnagle about his book: The 5 Things Every Privacy Lawyer Needs to Know about the FTC: An Interview with Chris Hoofnagle

My further thoughts on the book in my interview post above: “This is a book that all privacy and cybersecurity lawyers should have on their shelves. The book is the most comprehensive scholarly discussion of the FTC’s activities in these areas, and it also delves deep in the FTC’s history and activities in other areas to provide much-needed context to understand how it functions and reasons in privacy and security cases. There is simply no better resource on the FTC and privacy. This is a great book and a must-read. It is filled with countless fascinating things that will surprise you about the FTC, which has quite a rich and storied history. And it is an accessible and lively read too – Chris really makes the issues come alive.”

Gary T. Marx, Windows into the Soul: Surveillance and Society in an Age of High Technology

From Peter Grabosky: “The first word that came to mind while reading this book was cornucopia. After decades of research on surveillance, Gary Marx has delivered an abundant harvest indeed. The book is much more than a straightforward treatise. It borders on the encyclopedic, and is literally overflowing with ideas, observations, and analyses. Windows into the Soul commands the attention of anyone interested in surveillance, past, present, and future. The book’s website contains a rich abundance of complementary material. An additional chapter consists of an intellectual autobiography discussing the author’s interest in, and personal experience with, surveillance over the course of his career. Because of its extraordinary breadth, the book should appeal to a wide readership…. it will be of interest to scholars of deviance and social control, cultural studies, criminal justice and criminology. But the book should be read well beyond the towers of academe. The security industry, broadly defined to include private security and intelligence companies as well as state law enforcement and intelligence agencies, would benefit from the book’s insights. So too should it be read by those in the information technology industries, including the manufacturers of the devices and applications which are central to contemporary surveillance, and which are shaping our future.”

Susan C. Lawrence, Privacy and the Past: Research, Law, Archives, Ethics

From the book blurb: “When the new HIPAA privacy rules regarding the release of health information took effect, medical historians suddenly faced a raft of new ethical and legal challenges—even in cases where their subjects had died years, or even a century, earlier. In Privacy and the Past, medical historian Susan C. Lawrence explores the impact of these new privacy rules, offering insight into what historians should do when they research, write about, and name real people in their work.”

Ronald J. Krotoszynski, Privacy Revisited: A Global Perspective on the Right to Be Left Alone

From Mark Tushnet: “Professor Krotoszynski provides a valuable overview of how several constitutional systems accommodate competing interests in privacy, speech, and democracy. He shows how scholarship in comparative law can help one think about one’s own legal system while remaining sensitive to the different cultural and institutional settings of each nation’s law. A very useful contribution.”

Laura K. Donohue, The Future of Foreign Intelligence: Privacy and Surveillance in a Digital Age

Gordon Corera, Cyberspies: The Secret History of Surveillance, Hacking, and Digital Espionage

J. Macgregor Wise, Surveillance and Film…(More; See also Nonfiction Privacy + Security Books).

Cancer Research Orgs Release Big Data for Precision Medicine


 at HealthITAnalytics: “The American Association for Cancer Research (AACR) is releasing more than 19,000 de-identified genomic records to further the international research community’s explorations into precision medicine.

The big data dump, which includes information on 59 major types of cancer, including breast, colorectal, and lung cancer, is a result of the AACR Project Genomics Evidence Neoplasia Information Exchange (GENIE) initiative, and includes both genomic and some clinical data on consenting patients….

“These data were generated as part of routine patient care and without AACR Project GENIE they would likely never have been shared with the global cancer research community.”

Eight cancer research institutions, including five based in the United States, have contributed to the first phase of the GENIE project.  Dana-Farber Cancer Institute in Boston, Memorial Sloan Kettering Cancer Center in New York City, and the University of Texas MD Anderson Cancer Center in Houston are among the collaborators.

Alongside institutions in Paris, the Netherlands, Toronto, Nashville, and Baltimore, these organizations aim to expand the research community’s knowledge of cancer and its potential treatments by continuing to make the exchange of high-grade clinical data a top priority.

“We are committed to sharing not only the real-world data within the AACR Project GENIE registry but also our best practices, from tips about assembling an international consortium to the best variant analysis pipeline, because only by working together will information flow freely and patients benefit rapidly,” Sawyers added…

Large-scale initiatives like the AACR Project GENIE, alongside separate data collection efforts like the VA’s Million Veterans Project, the CancerLinQ platform, Geisinger Health System’s MyCode databank, and the nascent PMI Cohort, will continue to make critical genomic and clinical data available to investigators across the country and around the world…(More)”.

Beyond IRBs: Designing Ethical Review Processes for Big Data Research


Conference Proceedings by Future of Privacy Forum: “The ethical framework applying to human subject research in the biomedical and behavioral research fields dates back to the Belmont Report.Drafted in 1976 and adopted by the United States government in 1991 as the Common Rule, the Belmont principles were geared towards a paradigmatic controlled scientific experiment with a limited population of human subjects interacting directly with researchers and manifesting their informed consent. These days, researchers in academic institutions as well as private sector businesses not subject to the Common Rule, conduct analysis of a wide array of data sources, from massive commercial or government databases to individual tweets or Facebook postings publicly available online, with little or no opportunity to directly engage human subjects to obtain their consent or even inform them of research activities.

Data analysis is now used in multiple contexts, such as combatting fraud in the payment card industry, reducing the time commuters spend on the road, detecting harmful drug interactions, improving marketing mechanisms, personalizing the delivery of education in K-12 schools, encouraging exercise and weight loss, and much more. And companies deploy data research not only to maximize economic gain but also to test new products and services to ensure they are safe and effective. These data uses promise tremendous societal benefits but at the same time create new risks to privacy, fairness, due process and other civil liberties.

Increasingly, corporate officers find themselves struggling to navigate unsettled social norms and make ethical choices that are more befitting of philosophers than business managers or even lawyers. The ethical dilemmas arising from data analysis transcend privacy and trigger concerns about stigmatization, discrimination, human subject research, algorithmic decision making and filter bubbles.

The challenge of fitting the round peg of data-focused research into the square hole of existing ethical and legal frameworks will determine whether society can reap the tremendous opportunities hidden in the data exhaust of governments and cities, health care institutions and schools, social networks and search engines, while at the same time protecting privacy, fairness, equality and the integrity of the scientific process. One commentator called this “the biggest civil rights issue of our time.”…(More)”

‘We the People’: Five Years of Online Petitions


Paul Hitlin at Pew Research Center: “Americans are most likely to petition the White House on health care, veterans’ issues, illnesses, immigration, animal rights, holidays and criminal investigations, but the actual impact of petitions was modest and varied…

During President Obama’s first full day in office on Jan. 21, 2009, he issued a statement committing his administration to pursue “an unprecedented level of openness in Government.” His goal was to make the federal government more transparent, participatory and collaborative through the use of new technologies. The broader effort was called the Open Government Initiative, and a key part of it took effect more than two years later when the administration created an online petitioning system called “We the People” in September 2011. The White House promised to use the site to engage with the public and to issue responses to all petitions that reached a given number of signatures within 30 days of creation. The original threshold was set at 5,000 signatures but was increased to 100,000 in later years. As Obama prepares to leave office in early 2017, the site has been active for more than five years and is one of the most prominent legacies of the open government initiative….(More)”