Responsible Data reflection stories


Responsible Data Forum: “Through the various Responsible Data Forum events over the past couple of years, we’ve heard many anecdotes of responsible data challenges faced by people or organizations. These include potentially harmful data management practices, situations where people have experienced gut feelings that there is potential for harm, or workarounds that people have created to avoid those situations.

But we feel that trading in these “war stories” isn’t the most useful way for us to learn from these experiences as acommunity. Instead, we have worked with our communities to build a set of Reflection Stories: a structured, well-researched knowledge base on the unforeseen challenges and (sometimes) negative consequences of usingtechnology and data for social change.

We hope that this can offer opportunities for reflection and learning, as well as helping to develop innovativestrategies for engaging with technology and data in new and responsible ways….

What we learned from the stories

New spaces, new challenges

Moving into new digital spaces is bringing new challenges, and social media is one such space where these challengesare proving very difficult to navigate. This seems to stem from a number of key points:

  • organisations with low levels of technical literacy and experience in tech- or data-driven projects, deciding toengage suddenly with a certain tool or technology without realising what this entails. For some, this seems to stemfrom funders being more willing to support ‘innovative’ tech projects.
  • organisations wishing to engage more with social media while not being aware of more nuanced understandingsof public/private spaces online, and how different communities engage with social media. (see story #2)
    unpredictability and different levels of visibility: due to how privacy settings on Twitter are currently set, visibilityof users can be increased hugely by the actions of others – and once that happens, a user actually has very littleagency to change or reverse that. Sadly, being more visible on, for example, Twitter disproportionately affectswomen and minority groups in a negative way – so while ‘signal boosting’ to raise someone’s profile might be well-meant, the consequences are hard to predict, and almost impossible to reverse manually. (see story #4)
  • consent: related to the above point, “giving consent” can mean many different things when it comes to digitalspaces, especially if the person in question has little experience or understanding of using the technology inquestion (see stories #4 and #5).

Grey areas of responsible data

In almost all of the cases we looked at, very few decisions were concretely “right” or “wrong”. There are many, manygrey areas here, which need to be addressed on a case by case basis. In some cases, people involved really did thinkthrough their actions, and approached their problems thoughtfully and responsibly – but consequences they had notimagined, happened (see story #8).

Additionally, given the quickly moving nature of the space, challenges can arise that simply would not have beenpossible at the start.

….Despite the very varying settings of the stories collected, the shared mitigation strategies indicate that there areindeed a few key principles that can be kept in mind throughout the development of a new tech- or data-drivenproject.

The most stark of these – and one key aspect that is underlying many of these challenges – is a fundamental lack of technical literacy among advocacy organisations. This affects the way they interact with technical partners, the decisions they make around the project, the level to which they can have meaningful input, and more. Perhaps more crucially, it also affects the ability to know what to ask for help about – ie, to ‘know the unknowns’.

Building an organisation’s technical literacy might not mean being able to answer all technical questions in-house, but rather knowing what to ask and what to expect in an answer, from others. For advocacy organisations who don’t (yet)have this, it becomes all too easy to outsource not just the actual technical work but the contextual decisions too, which should be a collaborative process, benefiting from both sets of expertise.

There seems to be a lot of scope to expand this set of stories both in terms of collecting more from other advocacy organisations, and into other sectors, too. Ultimately, we hope that sharing our collective intelligence around lessonslearned from responsible data challenges faced in projects, will contribute to a greater understanding for all of us….Read all the stories here

Do Universities, Research Institutions Hold the Key to Open Data’s Next Chapter


Ben Miller at Government Technology: “Government produces a lot of data — reams of it, roomfuls of it, rivers of it. It comes in from citizen-submitted forms, fleet vehicles, roadway sensors and traffic lights. It comes from utilities, body cameras and smartphones. It fills up servers and spills into the cloud. It’s everywhere.

And often, all that data sits there not doing much. A governing entity might have robust data collection and it might have an open data policy, but that doesn’t mean it has the computing power, expertise or human capital to turn those efforts into value.

The amount of data available to government and the computing public promises to continue to multiply — the growing smart cities trend, for example, installs networks of sensors on everything from utility poles to garbage bins.

As all this happens, a movement — a new spin on an old concept — has begun to take root: partnerships between government and research institutes. Usually housed within universities and laboratories, these partnerships aim to match strength with strength. Where government has raw data, professors and researchers have expertise and analytics programs.

Several leaders in such partnerships, spanning some of the most tech-savvy cities in the country, see increasing momentum toward the concept. For instance, the John D. and Catherine T. MacArthur Foundation in September helped launch the MetroLab Network, an organization of more than 20 cities that have partnered with local universities and research institutes for smart-city-oriented projects….

Two recurring themes in projects that universities and research organizations take on in cooperation with government are project evaluation and impact analysis. That’s at least partially driven by the very nature of the open data movement: One reason to open data is to get a better idea of how well the government is operating….

Open data may have been part of the impetus for city-university partnerships, in that the availability of more data lured researchers wanting to work with it and extract value. But those partnerships have, in turn, led to government officials opening more data than ever before for useful applications.

Sort of.

“I think what you’re seeing is not just open data, but kind of shades of open — the desire to make the data open to university researchers, but not necessarily the broader public,” said Beth Noveck, co-founder of New York University’s GovLab.


shipping+crates

GOVLAB: DOCKER FOR DATA 

Much of what GovLab does is about opening up access to data, and that is the whole point of Docker for Data. The project aims to simplify and quicken the process of extracting and loading large data sets so they will respond to Structured Query Language commands by moving the computing power of that process to the cloud. The docker can be installed with a single line of code, and its website plays host to already-extracted data sets. Since its inception, the website has grown to include more than 100 gigabytes of data from more than 8,000 data sets. From Baltimore, for example, one can easily find information on public health, water sampling, arrests, senior centers and more. Photo via Shutterstock.


That’s partially because researchers are a controlled group who can be forced to sign memorandums of understanding and trained to protect privacy and prevent security breaches when government hands over sensitive data. That’s a top concern of agencies that manage data, and it shows in the GovLab’s work.

It was something Noveck found to be very clear when she started working on a project she simply calls “Arnold” because of project support from the Laura and John Arnold Foundation. The project involves building a better understanding of how different criminal justice jurisdictions collect, store and share data. The motivation is to help bridge the gaps between people who manage the data and people who should have easy access to it. When Noveck’s center conducted a survey among criminal justice record-keepers, the researchers found big differences between participants.

“There’s an incredible disparity of practices that range from some jurisdictions that have a very well established, formalized [memorandum of understanding] process for getting access to data, to just — you send an email to a guy and you hope that he responds, and there’s no organized way to gain access to data, not just between [researchers] and government entities, but between government entities,” she said….(More)

Ebola: A Big Data Disaster


Study by Sean Martin McDonald: “…undertaken with support from the Open Society Foundation, Ford Foundation, and Media Democracy Fund, explores the use of Big Data in the form of Call Detail Record (CDR) data in humanitarian crisis.

It discusses the challenges of digital humanitarian coordination in health emergencies like the Ebola outbreak in West Africa, and the marked tension in the debate around experimentation with humanitarian technologies and the impact on privacy. McDonald’s research focuses on the two primary legal and human rights frameworks, privacy and property, to question the impact of unregulated use of CDR’s on human rights. It also highlights how the diffusion of data science to the realm of international development constitutes a genuine opportunity to bring powerful new tools to fight crisis and emergencies.

Analysing the risks of using CDRs to perform migration analysis and contact tracing without user consent, as well as the application of big data to disease surveillance is an important entry point into the debate around use of Big Data for development and humanitarian aid. The paper also raises crucial questions of legal significance about the access to information, the limitation of data sharing, and the concept of proportionality in privacy invasion in the public good. These issues hold great relevance in today’s time where big data and its emerging role for development, involving its actual and potential uses as well as harms is under consideration across the world.

The paper highlights the absence of a dialogue around the significant legal risks posed by the collection, use, and international transfer of personally identifiable data and humanitarian information, and the grey areas around assumptions of public good. The paper calls for a critical discussion around the experimental nature of data modelling in emergency response due to mismanagement of information has been largely emphasized to protect the contours of human rights….

See Sean Martin McDonald – “Ebola: A Big Data Disaster” (PDF).

 

Meet your Matchmaker: New crowdsourced sites for rare diseases


Carina Storrs at CNN: “Angela’s son Jacob was born with a number of concerning traits. He had an extra finger, and a foot and hip that were abnormally shaped. The doctors called in geneticists to try to diagnose his unusual condition. “That started our long, 12-year journey,” said Angela, who lives in the Baltimore area.

As geneticists do, they studied Jacob’s genes, looking for mutations in specific regions of the genome that could point to a problem. But there were no leads.

In the meantime, Jacob developed just about every kind of health problem there is. He has cognitive delays, digestive problems, muscle weakness, osteoporosis and other ailments.

“It was extremely frustrating, it was like being on a roller coaster. You wait six to eight weeks for the (gene) test and then it comes back as showing nothing,” recalled Angela, who asked that their last name not be used to protect her son’s privacy. “How do we go about treating until we get at what it is?”

Finally a test last year, which was able to take a broad look at all of Jacob’s genes, revealed a possible genetic culprit, but it still did not shed any light on his condition. “Nothing was known about the gene,” said Dr. Antonie Kline, director of pediatric genetics at the Greater Baltimore Medical Center, who had been following Jacob since birth.

Fortunately, Kline knew about an online program called GeneMatcher, which launched in December 2013. It would allow her to enter the new mystery gene into a database and search for other clinicians in the world who work with patients who have mutations in the same gene….

the search for “someone else on the planet” can be hard, Hamosh said. The diseases in GeneMatcher are rare, affecting fewer than 200,000 people in the United States, and it can be difficult for clinicians with similar patients to find each other just through word of mouth and professional connections. Au, the Canadian researcher with a patient similar to Jacob, is actually a friend of Kline’s, but the two had never realized their patients’ similarities.

It was not just Hamosh and her colleagues who were struck by the need for something like GeneMatcher. At the same time they were developing their program, researchers in Canada and the UK were creating PhenomeCentral and Decipher, respectively.

The three are collectively known as matchmaker programs. They connect patients with rare diseases which clinicians may never have seen before. In the case of PhenomeCentral, however, clinicians do not have to have a genetic culprit and can search only for other patients with similar traits or symptoms.

In the summer of 2015, it got much easier for clinicians all over the world to use these programs, when a clearinghouse site called Matchmaker Exchange was launched. They can now enter the patient information one time and search all three databases….(More)

New #ODimpact Release: How is Open Data Creating Economic Opportunities and Solving Public Problems?


Andrew Young at The GovLab: “Last month, the GovLab and Omidyar Network launched Open Data’s Impact (odimpact.org), a custom-built repository offering a range of in-depth case studies on global open data projects. The initial launch of theproject featured the release of 13 open data impact case studies – ten undertaken by the GovLab, as well asthree case studies from Becky Hogge (@barefoot_techie), an independent researcher collaborating withOmidyar Network. Today, we are releasing a second batch of 12 case studies – nine case studies from theGovLab and three from Hogge…

The batch of case studies being revealed today examines two additional dimensions of impact. They find that:

  • Open data is creating new opportunities for citizens and organizations, by fostering innovation and promoting economic growth and job creation.
  • Open data is playing a role in solving public problems, primarily by allowing citizens and policymakers access to new forms of data-driven assessment of the problems at hand. It also enables data-driven engagement, producing more targeted interventions and enhanced collaboration.

The specific impacts revealed by today’s release of case studies are wide-ranging, and include both positive and negative transformations. We have found that open data has enabled:

  • The creation of new industries built on open weather data released by the United States NationalOceanic and Atmospheric Administration (NOAA).
  • The generation of billions of dollars of economic activity as a result of the Global Positioning System(GPS) being opened to the global public in the 1980s, and the United Kingdom’s Ordnance Survey geospatial offerings.
  • A more level playing field for small businesses in New York City seeking market research data.
  • The coordinated sharing of data among government and international actors during the response to theEbola outbreak in Sierra Leone.
  • The identification of discriminatory water access decisions in the case Kennedy v the City of Zanesville, resulting in a $10.9 million settlement for the African-American plaintiffs.
  • Increased awareness among Singaporeans about the location of hotspots for dengue fever transmission.
  • Improved, data-driven emergency response following earthquakes in Christchurch, New Zealand.
  • Troubling privacy violations on Eightmaps related to Californians’ political donation activity….(More)”

All case studies available at odimpact.org.

 

Privacy as a Public Good


Joshua A.T. Fairfield & Christoph Engel in Duke Law Journal: “Privacy is commonly studied as a private good: my personal data is mine to protect and control, and yours is yours. This conception of privacy misses an important component of the policy problem. An individual who is careless with data exposes not only extensive information about herself, but about others as well. The negative externalities imposed on nonconsenting outsiders by such carelessness can be productively studied in terms of welfare economics. If all relevant individuals maximize private benefit, and expect all other relevant individuals to do the same, neoclassical economic theory predicts that society will achieve a suboptimal level of privacy. This prediction holds even if all individuals cherish privacy with the same intensity. As the theoretical literature would have it, the struggle for privacy is destined to become a tragedy.

But according to the experimental public-goods literature, there is hope. Like in real life, people in experiments cooperate in groups at rates well above those predicted by neoclassical theory. Groups can be aided in their struggle to produce public goods by institutions, such as communication, framing, or sanction. With these institutions, communities can manage public goods without heavy-handed government intervention. Legal scholarship has not fully engaged this problem in these terms. In this Article, we explain why privacy has aspects of a public good, and we draw lessons from both the theoretical and the empirical literature on public goods to inform the policy discourse on privacy…(More)”

See also:

Privacy, Public Goods, and the Tragedy of the Trust Commons: A Response to Professors Fairfield and Engel, Dennis D. Hirsch

Response to Privacy as a Public Good, Priscilla M. Regan

Data Collaboratives: Matching Demand with Supply of (Corporate) Data to solve Public Problems


Blog by Stefaan G. Verhulst, IrynaSusha and Alexander Kostura: “Data Collaboratives refer to a new form of collaboration, beyond the public-private partnership model, in which participants from different sectors (private companies, research institutions, and government agencies) share data to help solve public problems. Several of society’s greatest challenges — from climate change to poverty — require greater access to big (but not always open) data sets, more cross-sector collaboration, and increased capacity for data analysis. Participants at the workshop and breakout session explored the various ways in which data collaborative can help meet these needs.

Matching supply and demand of data emerged as one of the most important and overarching issues facing the big and open data communities. Participants agreed that more experimentation is needed so that new, innovative and more successful models of data sharing can be identified.

How to discover and enable such models? When asked how the international community might foster greater experimentation, participants indicated the need to develop the following:

· A responsible data framework that serves to build trust in sharing data would be based upon existing frameworks but also accommodates emerging technologies and practices. It would also need to be sensitive to public opinion and perception.

· Increased insight into different business models that may facilitate the sharing of data. As experimentation continues, the data community should map emerging practices and models of sharing so that successful cases can be replicated.

· Capacity to tap into the potential value of data. On the demand side,capacity refers to the ability to pose good questions, understand current data limitations, and seek new data sets responsibly. On the supply side, this means seeking shared value in collaboration, thinking creatively about public use of private data, and establishing norms of responsibility around security, privacy, and anonymity.

· Transparent stock of available data supply, including an inventory of what corporate data exist that can match multiple demands and that is shared through established networks and new collaborative institutional structures.

· Mapping emerging practices and models of sharing. Corporate data offers value not only for humanitarian action (which was a particular focus at the conference) but also for a variety of other domains, including science,agriculture, health care, urban development, environment, media and arts,and others. Gaining insight in the practices that emerge across sectors could broaden the spectrum of what is feasible and how.

In general, it was felt that understanding the business models underlying data collaboratives is of utmost importance in order to achieve win-win outcomes for both private and public sector players. Moreover, issues of public perception and trust were raised as important concerns of government organizations participating in data collaboratives….(More)”

The Digital Equilibrium Project


Press Release by The Digital Equilibrium Project: “Cybersecurity, government and privacy experts are banding together as part of The ‘Digital Equilibrium Project’ to foster a new, productive dialogue on balancing security and privacy in the connected world. The project aims to address the underlying issues fueling acrimonious debates like the contentious court order between Apple and the U.S. Government.

  • The diverse group includes current and former leaders of some of the world’s largest cybersecurity firms and organizations, former officials in the NSA and national law enforcement, and leaders of some of the nation’s most influential privacy organizations. These individuals believe new thinking and collaboration is needed to avert potential catastrophes as the digital and physical worlds become more interdependent.
  • The group will release its foundational paper ‘Balancing Security and Privacy in the Connected World’ on Tuesday, March 1st at the RSA Conference – the world’s largest cybersecurity conference.
  • This project and related paper, months in the making, seek to end the kinds of standoffs we are seeing between Apple and the U.S. Government, addressing the underlying lack of social norms and legal constructs for the digital world.
  • They will convene a mid-year summit to craft a framework or ‘constitution’ for the digital world. The intent of this constitution is to help guide policy creation, broker compromise and serve as the foundation for decision making around cybersecurity issues. Senior executives from the Justice Department, Apple and other technology firms will be invited to participate…..

Next week the group will publish its foundational paper, crafted over extensive meetings, interviews and working sessions. The paper is meant to foster a new, collaborative discussion on the most pressing questions that could determine the future safety and social value of the Internet and the digital technologies that depend on it. In addition to releasing the paper at the RSA Conference, members of the group will discuss the paper and related issues during a main-stage panel session moderated by Art Coviello, former Executive Chairman of RSA Security, and James Kaplan, a McKinsey partner, on Thursday, March 3rd. Panel members will include: Michael Chertoff, Executive Chairman of The Chertoff Group and former Secretary of Homeland Security; Trevor Hughes, President and CEO of the International Association of Privacy Professionals; Mike McConnell, former Director of the NSA and Director, National Intelligence; and Nuala O’Connor, President and CEO, Center for Democracy & Technology.

The paper urges governments, corporations and privacy advocates to put aside the polarizing arguments that have cast security and privacy as opposing forces, and calls for a mid-year summit meeting between these parties to formulate a new structure for advancement of these pressing issues. It poses four fundamental questions that must be addressed to ensure the digital world can evolve in ways that ensure individual privacy while enabling the productivity and commercial gains that can improve quality of life around the globe. The four questions are:

  • What practices should organizations adopt to achieve their goals while protecting the privacy of their customers and other stakeholders?
  • How can organizations continue to improve the protection of their digital infrastructures and adopt privacy management practices that protect their employees?
  • What privacy management practices should governments adopt to maintain civil liberties and expectations of privacy, while ensuring the safety and security of their citizens, organizations, and critical infrastructure?
  • What norms should countries adopt to protect their sovereignty while enabling global commerce and collaboration against criminal and terrorist threats?

The Digital Equilibrium Project’s foundational paper will available for download on March 1st at www.digitalequilibriumproject.com

Give Up Your Data to Cure Disease


David B. Agus in The New York Times: “How far would you go to protect your health records? Your privacy matters, of course, but consider this: Mass data can inform medicine like nothing else and save countless lives, including, perhaps, your own.

Over the past several years, using some $30 billion in federal stimulus money, doctors and hospitals have been installing electronic health record systems. ….Yet neither doctors nor patients are happy. Doctors complain about the time it takes to update digital records, while patients worry about confidentiality…

We need to get over it. These digital databases offer an incredible opportunity to examine trends that will fundamentally change how doctors treat patients. They will help develop cures, discover new uses for drugs and better track the spread of scary new illnesses like the Zika virus….

Case in point: Last year, a team led by researchers at the MD Anderson Cancer Center and Washington University found that a common class of heart drugs called beta blockers, which block the effects of adrenaline, may prolong ovarian cancer patients’ survival. This discovery came after the researchers reviewed more than 1,400 patient records, and identified an obvious pattern among those with ovarian cancer who were using beta blockers, most often to control their blood pressure. Women taking earlier versions of this class of drug typically lived for almost eight years after their cancer diagnosis, compared with just three and a half years for the women not taking any beta blocker….

We need to move past that. For one thing, more debate over data sharing is already leading to more data security. Last month a bill was signed into law calling for the Department of Health and Human Services to create a health care industry cybersecurity task force, whose members would hammer out new voluntary standards.

New technologies — and opportunities — come with unprecedented risks and the need for new policies and strategies. We must continue to improve our encryption capabilities and other methods of data security and, most important, mandate that they are used. The hack of the Anthem database last year, for instance, which allowed 80 million personal records to be accessed, was shocking not only for the break-in, but for the lack of encryption….

Medical research is making progress every day, but the next step depends less on scientists and doctors than it does on the public. Each of us has the potential to be part of tomorrow’s cures. (More)”

What a Million Syllabuses Can Teach Us


College course syllabuses are curious documents. They represent the best efforts by faculty and instructors to distill human knowledge on a given subject into 14-week chunks. They structure the main activity of colleges and universities. And then, for the most part, they disappear….

Until now. Over the past two years, we and our partners at the Open Syllabus Project (based at the American Assembly at Columbia) have collected more than a million syllabuses from university websites. We have also begun to extract some of their key components — their metadata — starting with their dates, their schools, their fields of study and the texts that they assign.

This past week, we made available online a beta version of our Syllabus Explorer, which allows this database to be searched. Our hope and expectation is that this tool will enable people to learn new things about teaching, publishing and intellectual history.

At present, the Syllabus Explorer is mostly a tool for counting how often texts are assigned over the past decade. There is something for everyone here. The traditional Western canon dominates the top 100, with Plato’s “Republic” at No. 2, “The Communist Manifesto” at No. 3, and “Frankenstein” at No. 5, followed by Aristotle’s “Ethics,” Hobbes’s “Leviathan,” Machiavelli’s “The Prince,” “Oedipus” and “Hamlet.”….

Top articles? Garrett Hardin’s “The Tragedy of the Commons” and Francis Fukuyama’s “The End of History.” And so on. Altogether, the Syllabus Explorer tracks about 933,000 works. Nearly half of these are assigned only once.

Such data has many uses. For academics, for example, it offers a window onto something they generally know very little about: how widely their work is read.

It also allows us to introduce a new publication metric based on the frequency with which works are taught, which we call the “teaching score.” The score is derived from the ranking order of the text, not the raw number of citations, such that a book or article that is used in four or five classes gets a score of 1, while “The Republic,” which is assigned 3,500 times, gets a score of 100….

Because of a complex mix of privacy and copyright issues concerning syllabuses, the Open Syllabus Project publishes only metadata, not the underlying documents or any personally identifying material (even though these documents can be viewed on university websites). But we think that it is important for schools to move toward a more open approach to curriculums. As universities face growing pressure to justify their teaching and research missions, we doubt that curricular obscurity is helpful.

We think that the Syllabus Explorer demonstrates how more open strategies can support teaching, diversify evaluation practices and offer new perspectives on publishing, scholarship and intellectual traditions. But as with any newly published work, that judgment now passes out of our hands and into yours…(More)”