Citizen Science: The Law and Ethics of Public Access to Medical Big Data


New Paper by Sharona Hoffman: Patient-related medical information is becoming increasingly available on the Internet, spurred by government open data policies and private sector data sharing initiatives. Websites such as HealthData.gov, GenBank, and PatientsLikeMe allow members of the public to access a wealth of health information. As the medical information terrain quickly changes, the legal system must not lag behind. This Article provides a base on which to build a coherent data policy. It canvasses emergent data troves and wrestles with their legal and ethical ramifications.
Publicly accessible medical data have the potential to yield numerous benefits, including scientific discoveries, cost savings, the development of patient support tools, healthcare quality improvement, greater government transparency, public education, and positive changes in healthcare policy. At the same time, the availability of electronic personal health information that can be mined by any Internet user raises concerns related to privacy, discrimination, erroneous research findings, and litigation. This Article analyzes the benefits and risks of health data sharing and proposes balanced legislative, regulatory, and policy modifications to guide data disclosure and use.”

The Crypto-democracy and the Trustworthy


New Paper by Sebastien Gambs, Samuel Ranellucci, and Alain Tapp: “In the current architecture of the Internet, there is a strong asymmetry in terms of power between the entities that gather and process personal data (e.g., major Internet companies, telecom operators, cloud providers, …) and the individuals from which this personal data is issued. In particular, individuals have no choice but to blindly trust that these entities will respect their privacy and protect their personal data. In this position paper, we address this issue by proposing an utopian crypto-democracy model based on existing scientific achievements from the field of cryptography. More precisely, our main objective is to show that cryptographic primitives, including in particular secure multiparty computation, offer a practical solution to protect privacy while minimizing the trust assumptions. In the crypto-democracy envisioned, individuals do not have to trust a single physical entity with their personal data but rather their data is distributed among several institutions. Together these institutions form a virtual entity called the Trustworthy that is responsible for the storage of this data but which can also compute on it (provided first that all the institutions agree on this). Finally, we also propose a realistic proof-of-concept of the Trustworthy, in which the roles of institutions are played by universities. This proof-of-concept would have an important impact in demonstrating the possibilities offered by the crypto-democracy paradigm.”

From “Bitcoin to Burning Man and Beyond”


IDCubed: “From Bitcoin to Burning Man and Beyond: The Quest for Autonomy and Identity in a Digital Society explores a new generation of digital technologies that are re-imagining the very foundations of identity, governance, trust and social organization.
The fifteen essays of this book stake out the foundations of a new future – a future of open Web standards and data commons, a society of decentralized autonomous organizations, a world of trustworthy digital currencies and self-organized and expressive communities like Burning Man.
Among the contributors are Alex “Sandy” Pentland of the M.I.T. Human Dynamics Laboratory, former FCC Chairman Reed E. Hundt, long-time IBM strategist Irving Wladawksy-Berger, monetary system expert Bernard Lietaer, Silicon Valley entrepreneur Peter Hirshberg, journalist Jonathan Ledgard and H-Farm cofounder Maurizio Rossi.
From Bitcoin to Burning Man and Beyond was edited by Dr. John H. Clippinger, cofounder and executive director of ID3, and David Bollier, an Editor at ID3 who is also an author, blogger and scholar who studies the commons. The book, published by ID3 in association with Off the Common Books, reflects ID3’s vision of the huge, untapped potential for self-organized, distributed governance on open platforms.
The book is available in print and ebook formats (Kindle and epub) from Amazon.com and Off the Common Books. The book, licensed under a Creative Commons Attribution-NonCommercial-ShareAlike license (BY-NC-SA), may also be downloaded for free as a pdf file from ID3.
One chapter that inspires the book’s title traces the 28-year history of Burning Man, the week-long encampment in the Nevada desert that have hosted remarkable experimentation in new forms of self-governance by large communities. Other chapters explore such cutting-edge concepts as

  • evolvable digital contracts that could supplant conventional legal agreements;
  • smartphone currencies that could help Africans meet their economic needs more effective;
  • the growth of the commodity-backed Ven currency; and
  • new types of “solar currencies” that borrow techniques from Bitcoin to enable more efficient, cost-effective solar generation and sharing by homeowners.

From Bitcoin to Burning Man and Beyond also introduces the path-breaking software platform that ID3 has developed called “Open Mustard Seed,” or OMS. The just-released open source program enables the rise of new types of trusted, self-healing digital institutions on open networks, which in turn will make possible new sorts of privacy-friendly social ecosystems.
“OMS is an integrated, open source package of programs that lets people collect and share personal information in secure, and transparent and accountable ways, enabling authentic, trusted social and economic relationships to flourish,” said Dr. John H. Clippinger, executive director of ID3, an acronym for the Institute for Institutional Innovation and Data-Driven Design.
“The software builds individual privacy, security and trusted exchange into the very design of the system. In effect, OMS represents a new authentication, privacy and sharing layer for the Internet,” said Clippinger “– a new way to share personal information selectively and securely, without access by unauthorized third parties.”
A two-minute video introducing the capabilities of OMS can be viewed here.”

The Changing Nature of Privacy Practice


Numerous commenters have observed that Facebook, among many marketers (including political campaigns like U.S. President Barack Obama’s), regularly conducts A-B tests and other research to measure how consumers respond to different products, messages and messengers. So what makes the Facebook-Cornell study different from what goes on all the time in an increasingly data-driven world? After all, the ability to conduct such testing continuously on a large scale is considered one of the special features of big data.
The answer calls for broader judgments than parsing the language of privacy policies or managing compliance with privacy laws and regulations. Existing legal tools such as notice-and-choice and use limitations are simply too narrow to address the array of issues presented and inform the judgment needed. Deciding whether Facebook ought to participate in research like its newsfeed study is not really about what the company can do but what it should do.
As Omer Tene and Jules Polonetsky, CIPP/US, point out in an article on Facebook’s research study, “Increasingly, corporate officers find themselves struggling to decipher subtle social norms and make ethical choices that are more befitting of philosophers than business managers or lawyers.” They add, “Going forward, companies will need to create new processes, deploying a toolbox of innovative solutions to engender trust and mitigate normative friction.” Tene and Polonetsky themselves have proposed a number of such tools. In recent comments on Consumer Privacy Bill of Rights legislation filed with the Commerce Department, the Future of Privacy Forum (FPF) endorsed the use of internal review boards along the lines of those used in academia for human-subject research. The FPF also submitted an initial framework for benefit-risk analysis in the big data context “to understand whether assuming the risk is ethical, fair, legitimate and cost-effective.” Increasingly, companies and other institutions are bringing to bear more holistic review of privacy issues. Conferences and panels on big data research ethics are proliferating.
The expanding variety and complexity of data uses also call for a broader public policy approach. The Obama administration’s Consumer Privacy Bill of Rights (of which I was an architect) adapted existing Fair Information Practice Principles to a principles-based approach that is intended not as a formalistic checklist but as a set of principles that work holistically in ways that are “flexible” and “dynamic.” In turn, much of the commentary submitted to the Commerce Department on the Consumer Privacy Bill of Rights addressed the question of the relationship between these principles and a “responsible use framework” as discussed in the White House Big Data Report….”

Not just the government’s playbook


at Radar: “Whenever I hear someone say that “government should be run like a business,” my first reaction is “do you know how badly most businesses are run?” Seriously. I do not want my government to run like a business — whether it’s like the local restaurants that pop up and die like wildflowers, or megacorporations that sell broken products, whether financial, automotive, or otherwise.
If you read some elements of the press, it’s easy to think that healthcare.gov is the first time that a website failed. And it’s easy to forget that a large non-government website was failing, in surprisingly similar ways, at roughly the same time. I’m talking about the Common App site, the site high school seniors use to apply to most colleges in the US. There were problems with pasting in essays, problems with accepting payments, problems with the app mysteriously hanging for hours, and more.
 
I don’t mean to pick on Common App; you’ve no doubt had your own experience with woefully bad online services: insurance companies, Internet providers, even online shopping. I’ve seen my doctor swear at the Epic electronic medical records application when it crashed repeatedly during an appointment. So, yes, the government builds bad software. So does private enterprise. All the time. According to TechRepublic, 68% of all software projects fail. We can debate why, and we can even debate the numbers, but there’s clearly a lot of software #fail out there — in industry, in non-profits, and yes, in government.
With that in mind, it’s worth looking at the U.S. CIO’s Digital Services Playbook. It’s not ideal, and in many respects, its flaws reveal its origins. But it’s pretty good, and should certainly serve as a model, not just for the government, but for any organization, small or large, that is building an online presence.
The playbook consists of 13 principles (called “plays”) that drive modern software development:

  • Understand what people need
  • Address the whole experience, from start to finish
  • Make it simple and intuitive
  • Build the service using agile and iterative practices
  • Structure budgets and contracts to support delivery
  • Assign one leader and hold that person accountable
  • Bring in experienced teams
  • Choose a modern technology stack
  • Deploy in a flexible hosting environment
  • Automate testing and deployments
  • Manage security and privacy through reusable processes
  • Use data to drive decisions
  • Default to open

These aren’t abstract principles: most of them should be familiar to anyone who has read about agile software development, attended one of our Velocity conferences, one of the many DevOps Days, or a similar event. All of the principles are worth reading (it’s not a long document). I’m going to call out two for special attention….”

Reddit, Imgur and Twitch team up as 'Derp' for social data research


in The Guardian: “Academic researchers will be granted unprecedented access to the data of major social networks including Imgur, Reddit, and Twitch as part of a joint initiative: The Digital Ecologies Research Partnership (Derp).
Derp – and yes, that really is its name – will be offering data to universities including Harvard, MIT and McGill, to promote “open, publicly accessible, and ethical academic inquiry into the vibrant social dynamics of the web”.
It came about “as a result of Imgur talking with a number of other community platforms online trying to learn about how they work with academic researchers,” says Tim Hwang, the image-sharing site’s head of special initiatives.
“In most cases, the data provided through Derp will already be accessible through public APIs,” he says. “Our belief is that there are ways of doing research better, and in a way that strongly respects user privacy and responsible use of data.
“Derp is an alliance of platforms that all believe strongly in this. In working with academic researchers, we support projects that meet institutional review at their home institution, and all research supported by Derp will be released openly and made publicly available.”
Hwang points to a Stanford paper analysing the success of Reddit’s Random Acts of Pizza subforum as an example of the sort of research Derp hopes to foster. In the research, Tim Althoff, Niloufar Salehi and Tuan Nguyen found that the likelihood of getting a free pizza from the Reddit community depended on a number of factors, including how the request was phrased, how much the user posted on the site, and how many friends they had online. In the end, they were able to predict with 67% accuracy whether or not a given request would be fulfilled.
The grouping aims to solve two problems academic research faces. Researchers themselves find it hard to get data outside of the larges social media platforms, such as Twitter and Facebook. The major services at least have a vibrant community of developers and researchers working on ways to access and use data, but for smaller communities, there’s little help provided.
Yet smaller is relative: Reddit may be a shrimp compared to Facebook, but with 115 million unique visitors every month, it’s still a sizeable community. And so Derp aims to offer “a single point of contact for researchers to get in touch with relevant team members across a range of different community sites….”

Reality Mining: Using Big Data to Engineer a Better World


New book by Nathan Eagle and Kate Greene : “Big Data is made up of lots of little data: numbers entered into cell phones, addresses entered into GPS devices, visits to websites, online purchases, ATM transactions, and any other activity that leaves a digital trail. Although the abuse of Big Data—surveillance, spying, hacking—has made headlines, it shouldn’t overshadow the abundant positive applications of Big Data. In Reality Mining, Nathan Eagle and Kate Greene cut through the hype and the headlines to explore the positive potential of Big Data, showing the ways in which the analysis of Big Data (“Reality Mining”) can be used to improve human systems as varied as political polling and disease tracking, while considering user privacy.

Eagle, a recognized expert in the field, and Greene, an experienced technology journalist, describe Reality Mining at five different levels: the individual, the neighborhood and organization, the city, the nation, and the world. For each level, they first offer a nontechnical explanation of data collection methods and then describe applications and systems that have been or could be built. These include a mobile app that helps smokers quit smoking; a workplace “knowledge system”; the use of GPS, Wi-Fi, and mobile phone data to manage and predict traffic flows; and the analysis of social media to track the spread of disease. Eagle and Greene argue that Big Data, used respectfully and responsibly, can help people live better, healthier, and happier lives.”

Digital Footprints: Opportunities and Challenges for Online Social Research


Paper by Golder, Scott A. and Macy, Michael for the Annual Review of Sociology: “Online interaction is now a regular part of daily life for a demographically diverse population of hundreds of millions of people worldwide. These interactions generate fine-grained time-stamped records of human behavior and social interaction at the level of individual events, yet are global in scale, allowing researchers to address fundamental questions about social identity, status, conflict, cooperation, collective action, and diffusion, both by using observational data and by conducting in vivo field experiments. This unprecedented opportunity comes with a number of methodological challenges, including generalizing observations to the offline world, protecting individual privacy, and solving the logistical challenges posed by “big data” and web-based experiments. We review current advances in online social research and critically assess the theoretical and methodological opportunities and limitations. [J]ust as the invention of the telescope revolutionized the study of the heavens, so too by rendering the unmeasurable measurable, the technological revolution in mobile, Web, and Internet communications has the potential to revolutionize our understanding of ourselves and how we interact…. [T]hree hundred years after Alexander Pope argued that the proper study of mankind should lie not in the heavens but in ourselves, we have finally found our telescope. Let the revolution begin. —Duncan Watts”

Fifteen open data insights


Tim Davies from ODRN: “…below are the 15 points from the three-page briefing version, and you can find a full write-up of these points for download. You can also find reports from all the individual project partners, including a collection of quick-read research posters over on the Open Data Research Network website.

15 insights into open data supply, use and impacts

(1) There are many gaps to overcome before open data availability, can lead to widespread effective use and impact. Open data can lead to change through a ‘domino effect’, or by creating ripples of change that gradually spread out. However, often many of the key ‘domino pieces’ are missing, and local political contexts limit the reach of ripples. Poor data quality, low connectivity, scarce technical skills, weak legal frameworks and political barriers may all prevent open data triggering sustainable change. Attentiveness to all the components of open data impact is needed when designing interventions.
(2) There is a frequent mismatch between open data supply and demand in developing countries. Counting datasets is a poor way of assessing the quality of an open data initiative. The datasets published on portals are often the datasets that are easiest to publish, not the datasets most in demand. Politically sensitive datasets are particularly unlikely to be published without civil society pressure. Sometimes the gap is on the demand side – as potential open data users often do not articulate demands for key datasets.
(3) Open data initiatives can create new spaces for civil society to pursue government accountability and effectiveness. The conversation around transparency and accountability that ideas of open data can support is as important as the datasets in some developing countries.
(4) Working on open data projects can change how government creates, prepares and uses its own data. The motivations behind an open data initiative shape how government uses the data itself. Civil society and entrepreneurs interacting with government through open data projects can help shape government data practices. This makes it important to consider which intermediaries gain insider roles shaping data supply.
(5) Intermediaries are vital to both the supply and the use of open data. Not all data needed for governance in developing countries comes from government. Intermediaries can create data, articulate demands for data, and help translate open data visions from political leaders into effective implementations. Traditional local intermediaries are an important source of information, in particular because they are trusted parties.
(6) Digital divides create data divides in both the supply and use of data. In some developing countries key data is not digitised, or a lack of technical staff has left data management patchy and inconsistent. Where Internet access is scarce, few citizens can have direct access to data or services built with it. Full access is needed for full empowerment, but offline intermediaries, including journalists and community radio stations, also play a vital role in bridging the gaps between data and citizens.
(7) Where information is already available and used, the shift to open data involves data evolution rather than data revolution. Many NGOs and intermediaries already access the information which is now becoming available as data. Capacity building should start from existing information and data practices in organisations, and should look for the step-by-step gains to be made from a data-driven approach.
(8) Officials’ fears about the integrity of data are a barrier to more machine-readable data being made available. The publication of data as PDF or in scanned copies is often down to a misunderstanding of how open data works. Only copies can be changed, and originals can be kept authoritative. Helping officials understand this may help increase the supply of data.
(9) Very few datasets are clearly openly licensed, and there is low understanding of what open licenses entail. There are mixed opinions on the importance of a focus on licensing in different contexts. Clear licenses are important to building a global commons of interoperable data, but may be less relevant to particular uses of data on the ground. In many countries wider conversation about licensing are yet to take place.
(10) Privacy issues are not on the radar of most developing country open data projects, although commercial confidentiality does arise as a reason preventing greater data transparency. Much state held data is collected either from citizens or from companies. Few countries in the ODDC study have weak or absent privacy laws and frameworks, yet participants in the studies raised few personal privacy considerations. By contrast, a lack of clarity, and officials’ concerns, about potential breaches of commercial confidentiality when sharing data gathered from firms was a barrier to opening data.
(11) There is more to open data than policies and portals. Whilst central open data portals act as a visible symbol of open data initiatives, a focus on portal building can distract attention from wider reforms. Open data elements can also be built on existing data sharing practices, and data made available through the locations where citizens, NGOs are businesses already go to access information.
(12) Open data advocacy should be aware of, and build upon, existing policy foundations in specific countries and sectors. Sectoral transparency policies for local government, budget and energy industry regulation, amongst others, could all have open data requirements and standards attached, drawing on existing mechanisms to secure sustainable supplies of relevant open data in developing countries. In addition, open data conversations could help make existing data collection and disclosure requirements fit better with the information and data demands of citizens.
(13) Open data is not just a central government issue: local government data, city data, and data from the judicial and legislative branches are all important. Many open data projects focus on the national level, and only on the executive branch. However, local government is closer to citizens, urban areas bring together many of the key ingredients for successful open data initiatives, and transparency in other branches of government is important to secure citizens democratic rights.
(14) Flexibility is needed in the application of definitions of open data to allow locally relevant and effective open data debates and advocacy to emerge. Open data is made up of various elements, including proactive publication, machine-readability and permissions to re-use. Countries at different stages of open data development may choose to focus on one or more of these, but recognising that adopting all elements at once could hinder progress. It is important to find ways to both define open data clearly, and to avoid a reductive debate that does not recognise progressive steps towards greater openness.
(15) There are many different models for an open data initiative: including top-down, bottom-up and sector-specific. Initiatives may also be state-led, civil society-led and entrepreneur-led in their goals and how they are implemented – with consequences for the resources and models required to make them sustainable. There is no one-size-fits-all approach to open data. More experimentation, evaluation and shared learning on the components, partners and processes for putting open data ideas into practice must be a priority for all who want to see a world where open-by-default data drives real social, political and economic change.
You can read more about each of these points in the full report.”

Selected Readings on Economic Impact of Open Data


The Living Library’s Selected Readings series seeks to build a knowledge base on innovative approaches for improving the effectiveness and legitimacy of governance. This curated and annotated collection of recommended works on the topic of open data was originally published in 2014.

Open data is publicly available data – often released by governments, scientists, and occasionally private companies – that is made available for anyone to use, in a machine-readable format, free of charge. Considerable attention has been devoted to the economic potential of open data for businesses and other organizations, and it is now widely accepted that open data plays an important role in spurring innovation, growth, and job creation. From new business models to innovation in local governance, open data is being quickly adopted as a valuable resource at many levels.

Measuring and analyzing the economic impact of open data in a systematic way is challenging, and governments as well as other providers of open data seek to provide access to the data in a standardized way. As governmental transparency increases and open data changes business models and activities in many economic sectors, it is important to understand best practices for releasing and using non-proprietary, public information. Costs, social challenges, and technical barriers also influence the economic impact of open data.

These selected readings are intended as a first step in the direction of answering the question of if we can and how we consider if opening data spurs economic impact.

Selected Reading List (in alphabetical order)

Annotated Selected Reading List (in alphabetical order)

Bonina, Carla. New Business Models and the Values of Open Data: Definitions, Challenges, and Opportunities. NEMODE 3K – Small Grants Call 2013. http://bit.ly/1xGf9oe

  • In this paper, Dr. Carla Bonina provides an introduction to open data and open data business models, evaluating their potential economic value and identifying future challenges for the effectiveness of open data, such as personal data and privacy, the emerging data divide, and the costs of collecting, producing and releasing open (government) data.

Carpenter, John and Phil Watts. Assessing the Value of OS OpenData™ to the Economy of Great Britain – Synopsis. June 2013. Accessed July 25, 2014. http://bit.ly/1rTLVUE

  • John Carpenter and Phil Watts of Ordnance Survey undertook a study to examine the economic impact of open data to the economy of Great Britain. Using a variety of methods such as case studies, interviews, downlad analysis, adoption rates, impact calculation, and CGE modeling, the authors estimates that the OS OpenData initiative will deliver a net of increase in GDP of £13 – 28.5 million for Great Britain in 2013.

Capgemini Consulting. The Open Data Economy: Unlocking Economic Value by Opening Government and Public Data. Capgemini Consulting. Accessed July 24, 2014. http://bit.ly/1n7MR02

  • This report explores how governments are leveraging open data for economic benefits. Through using a compariative approach, the authors study important open data from organizational, technological, social and political perspectives. The study highlights the potential of open data to drive profit through increasing the effectiveness of benchmarking and other data-driven business strategies.

Deloitte. Open Growth: Stimulating Demand for Open Data in the UK. Deloitte Analytics. December 2012. Accessed July 24, 2014. http://bit.ly/1oeFhks

  • This early paper on open data by Deloitte uses case studies and statistical analysis on open government data to create models of businesses using open data. They also review the market supply and demand of open government data in emerging sectors of the economy.

Gruen, Nicholas, John Houghton and Richard Tooth. Open for Business: How Open Data Can Help Achieve the G20 Growth Target.  Accessed July 24, 2014, http://bit.ly/UOmBRe

  • This report highlights the potential economic value of the open data agenda in Australia and the G20. The report provides an initial literature review on the economic value of open data, as well as a asset of case studies on the economic value of open data, and a set of recommendations for how open data can help the G20 and Australia achieve target objectives in the areas of trade, finance, fiscal and monetary policy, anti-corruption, employment, energy, and infrastructure.

Heusser, Felipe I. Understanding Open Government Data and Addressing Its Impact (draft version). World Wide Web Foundation. http://bit.ly/1o9Egym

  • The World Wide Web Foundation, in collaboration with IDRC has begun a research network to explore the impacts of open data in developing countries. In addition to the Web Foundation and IDRC, the network includes the Berkman Center for Internet and Society at Harvard, the Open Development Technology Alliance and Practical Participation.

Howard, Alex. San Francisco Looks to Tap Into the Open Data Economy. O’Reilly Radar: Insight, Analysis, and Reach about Emerging Technologies.  October 19, 2012.  Accessed July 24, 2014. http://oreil.ly/1qNRt3h

  • Alex Howard points to San Francisco as one of the first municipalities in the United States to embrace an open data platform.  He outlines how open data has driven innovation in local governance.  Moreover, he discusses the potential impact of open data on job creation and government technology infrastructure in the City and County of San Francisco.

Huijboom, Noor and Tijs Van den Broek. Open Data: An International Comparison of Strategies. European Journal of ePractice. March 2011. Accessed July 24, 2014.  http://bit.ly/1AE24jq

  • This article examines five countries and their open data strategies, identifying key features, main barriers, and drivers of progress for of open data programs. The authors outline the key challenges facing European, and other national open data policies, highlighting the emerging role open data initiatives are playing in political and administrative agendas around the world.

Manyika, J., Michael Chui, Diana Farrell, Steve Van Kuiken, Peter Groves, and Elizabeth Almasi Doshi. Open Data: Unlocking Innovation and Performance with Liquid Innovation. McKinsey Global Institute. October 2013. Accessed July 24, 2014.  http://bit.ly/1lgDX0v

  • This research focuses on quantifying the potential value of open data in seven “domains” in the global economy: education, transportation, consumer products, electricity, oil and gas, health care, and consumer finance.

Moore, Alida. Congressional Transparency Caucus: How Open Data Creates Jobs. April 2, 2014. Accessed July 30, 2014. Socrata. http://bit.ly/1n7OJpp

  • Socrata provides a summary of the March 24th briefing of the Congressional Transparency Caucus on the need to increase government transparency through adopting open data initiatives. They include key takeaways from the panel discussion, as well as their role in making open data available for businesses.

Stott, Andrew. Open Data for Economic Growth. The World Bank. June 25, 2014. Accessed July 24, 2014. http://bit.ly/1n7PRJF

  • In this report, The World Bank examines the evidence for the economic potential of open data, holding that the economic potential is quite large, despite a variation in the published estimates, and difficulties assessing its potential methodologically. They provide five archetypes of businesses using open data, and provides recommendations for governments trying to maximize economic growth from open data.