Directory of crowdsourcing websites


Directory by Donelle McKinley: “…Here is just a selection of websites for crowdsourcing cultural heritage. Websites are actively crowdsourcing unless indicated with an asterisk…The directory is organized by the type of crowdsourcing process involved, using the typology for crowdsourcing in the humanities developed by Dunn & Hedges (2012). In their study they explain that, “a process is a sequence of tasks, through which an output is produced by operating on an asset”. For example, the Your Paintings Tagger website is for the process of tagging, which is an editorial task. The assets being tagged are images, and the output of the project is metadata, which makes the images easier to discover, retrieve and curate.

Transcription

Alexander Research Library, Wanganui Library * (NZ) Transcription of index cards from 1840 to 2002.

Ancient Lives*, University of Oxford (UK) Transcription of papyri from Greco-Roman Egypt.

AnnoTate, Tate Britain (UK) Transcription of artists’ diaries, letters and sketchbooks.

Decoding the Civil War, The Huntington Library, Abraham Lincoln Presidential Library and Museum &  North Carolina State University (USA). Transcription and decoding of Civil War telegrams from the Thomas T. Eckert Papers.

DIY History, University of Iowa Libraries (USA) Transcription of historical documents.

Emigrant City, New York Public Library (USA) Transcription of handwritten mortgage and bond ledgers from the Emigrant Savings Bank records.

Field Notes of Laurence M. Klauber, San Diego Natural History Museum (USA) Transcription of field notes by the celebrated herpetologist.

Notes from Nature Transcription of natural history museum records.

Measuring the ANZACs, Archives New Zealand and Auckland War Memorial Museum (NZ). Transcription of first-hand accounts of NZ soldiers in WW1.

Old Weather (UK) Transcription of Royal Navy ships logs from the early twentieth century.

Scattered Seeds, Heritage Collections, Dunedin Public Libraries (NZ) Transcription of index cards for Dunedin newspapers 1851-1993

Shakespeare’s World, Folger Shakespeare Library (USA) & Oxford University Press (UK). Transcription of handwritten documents by Shakespeare’s contemporaries. Identification of words that have yet to be recorded in the authoritative Oxford English Dictionary.

Smithsonian Digital Volunteers Transcription Center (USA) Transcription of multiple collections.

Transcribe Bentham, University College London (UK) Transcription of historical manuscripts by philosopher and reformer Jeremy Bentham,

What’s on the menu? New York Public Library (USA) Transcription of historical restaurant menus. …

(Full Directory).

The Billions We’re Wasting in Our Jails


Stephen Goldsmith  and Jane Wiseman in Governing: “By using data analytics to make decisions about pretrial detention, local governments could find substantial savings while making their communities safer….

Few areas of local government spending present better opportunities for dramatic savings than those that surround pretrial detention. Cities and counties are wasting more than $3 billion a year, and often inducing crime and job loss, by holding the wrong people while they await trial. The problem: Only 10 percent of jurisdictions use risk data analytics when deciding which defendants should be detained.

As a result, dangerous people are out in our communities, while many who could be safely in the community are behind bars. Vast numbers of people accused of petty offenses spend their pretrial detention time jailed alongside hardened convicts, learning from them how to be better criminals….

In this era of big data, analytics not only can predict and prevent crime but also can discern who should be diverted from jail to treatment for underlying mental health or substance abuse issues. Avoided costs aggregating in the billions could be better spent on detaining high-risk individuals, more mental health and substance abuse treatment, more police officers and other public safety services.

Jurisdictions that do use data to make pretrial decisions have achieved not only lower costs but also greater fairness and lower crime rates. Washington, D.C., releases 85 percent of defendants awaiting trial. Compared to the national average, those released in D.C. are two and a half times more likely to remain arrest-free and one and a half times as likely to show up for court.

Louisville, Ky., implemented risk-based decision-making using a tool developed by the Laura and John Arnold Foundation and now releases 70 percent of defendants before trial. Those released have turned out to be twice as likely to return to court and to stay arrest-free as those in other jurisdictions. Mesa County, Colo., and Allegheny County, Pa., both have achieved significant savings from reduced jail populations due to data-driven release of low-risk defendants.

Data-driven approaches are beginning to produce benefits not only in the area of pretrial detention but throughout the criminal justice process. Dashboards now in use in a handful of jurisdictions allow not only administrators but also the public to see court waiting times by offender type and to identify and address processing bottlenecks….(More)”

Nudging for Success


Press Release: “A groundbreaking report published today by ideas42 reveals several innovations that college administrators and policymakers can leverage to significantly improve college graduation rates at a time where completion is more out of reach than ever for millions of students.

The student path through college to graduation day is strewn with subtle, often invisible barriers that, over time, hinder students’ progress and cause some of them to drop out entirely. In Nudging for Success: Using Behavioral Science to Improve the Postsecondary Student Journey, ideas42 focuses on simple, low-cost ways to combat these unintentional obstacles and support student persistence and success at every stage in the college experience, from pre-admission to post-graduation. Teams worked with students, faculty and administrators at colleges around the country.

Even for students whose tuition is covered by financial aid, whose academic preparation is exemplary, and who are able to commit themselves full-time to their education, the subtle logistical and psychological sticking points can have a huge impact on their ability to persist and fully reap the benefits of a higher education.

Less than 60% of full-time students graduate from four-year colleges within six years, and less than 30% graduate from community colleges within three years. There are a myriad of factors often cited as deterrents to finishing school, such as the cost of tuition or the need to juggle family and work obligations, but behavioral science and the results of this report demonstrate that lesser-known dynamics like self-perception are also at play.

From increasing financial aid filing to fostering positive friend groups and a sense of belonging on campus, the 16 behavioral solutions outlined in Nudging for Success represent the potential for significant impact on the student experience and persistence. At Arizona State University, sending behaviorally-designed email reminders to students and parents about the Free Application for Federal Student Aid (FAFSA) priority deadline increased submissions by 72% and led to an increase in grant awards. Freshman retention among low-income, first generation, under-represented or other students most at risk of dropping out increased by 10% at San Francisco State University with the use of a testimonial video, self-affirming exercises, and monthly messaging aimed at first-time students.

“This evidence demonstrates how behavioral science can be the key to uplifting millions of Americans through education,” said Alissa Fishbane, Managing Director at ideas42. “By approaching the completion crisis from the whole experience of students themselves, administrators and policymakers have the opportunity to reduce the number of students who start, but do not finish, college—students who take on the financial burden of tuition but miss out on the substantial benefits of earning a degree.”

The results of this work drive home the importance of examining the college experience from the student perspective and through the lens of human behavior. College administrators and policymakers can replicate these gains at institutions across the country to make it simpler for students to complete the degree they started in ways that are often easier and less expensive to implement than existing alternatives—paving the way to stronger economic futures for millions of Americans….(More)”

In Your Neighborhood, Who Draws the Map?


Lizzie MacWillie at NextCity: “…By crowdsourcing neighborhood boundaries, residents can put themselves on the map in critical ways.

Why does this matter? Neighborhoods are the smallest organizing element in any city. A strong city is made up of strong neighborhoods, where the residents can effectively advocate for their needs. A neighborhood boundary marks off a particular geography and calls out important elements within that geography: architecture, street fabric, public spaces and natural resources, to name a few. Putting that line on a page lets residents begin to identify needs and set priorities. Without boundaries, there’s no way to know where to start.

Knowing a neighborhood’s boundaries and unique features allows a group to list its assets. What buildings have historic significance? What shops and restaurants exist? It also helps highlight gaps: What’s missing? What does the neighborhood need more of? What is there already too much of? Armed with this detailed inventory, residents can approach a developer, city council member or advocacy group with hard numbers on what they know their neighborhood needs.

With a precisely defined geography, residents living in a food desert can point to developable vacant land that’s ideal for a grocery store. They can also cite how many potential grocery shoppers live within the neighborhood.

In addition to being able to organize within the neighborhood, staking a claim to a neighborhood, putting it on a map and naming it, can help a neighborhood control its own narrative and tell its story — so someone else doesn’t.

Our neighborhood map project was started in part as a response to consistent misidentification of Dallas neighborhoods by local media, which appears to be particularly common in stories about majority-minority neighborhoods. This kind of oversight can contribute to a false narrative about a place, especially when the news is about crime or violence, and takes away from residents’ ability to tell their story and shape their neighborhood’s future. Even worse is when neighborhoods are completely left off of the map, as if they have no story at all to tell.

Neighborhood mapping can also counter narrative hijacking like I’ve seen in my hometown of Brooklyn, where realtor-driven neighborhood rebranding has led to areas being renamed. These places have their own unique identities and histories, yet longtime residents saw names changed so that real estate sellers could capitalize on increasing property values in adjacent trendy neighborhoods.

Cities across the country — including Dallas, Boston, New York, Chicago,Portland and Seattle — have crowdsourced mapping projects people can contribute to. For cities lacking such an effort, tools like Google Map Maker have been effective….(More)”.

Using Behavioral Science to Combat Climate Change


Cass R. Sunstein and Lucia A. Reisch in the Oxford Research Encyclopedia of Climate Science (Forthcoming): “Careful attention to choice architecture promises to open up new possibilities for reducing greenhouse gas emissions – possibilities that go well beyond, and that may supplement or complement, the standard tools of economic incentives, mandates, and bans. How, for example, do consumers choose between climate-friendly products or services and alternatives that are potentially damaging to the climate but less expensive? The answer may well depend on the default rule. Indeed, climate-friendly default rules may well be a more effective tool for altering outcomes than large economic incentives. The underlying reasons include the power of suggestion; inertia and procrastination; and loss aversion. If well-chosen, climate-friendly defaults are likely to have large effects in reducing the economic and environmental harms associated with various products and activities. In deciding whether to establish climate-friendly defaults, choice architects (subject to legal constraints) should consider both consumer welfare and a wide range of other costs and benefits. Sometimes that assessment will argue strongly in favor of climate-friendly defaults, particularly when both economic and environmental considerations point in their direction. Notably, surveys in the United States and Europe show that majorities in many nations are in favor of climate-friendly defaults….(More)”

What Can Civic Tech Learn From Social Movements?


Stacy Donohue at Omidyar Network: “…In order to spur creative thinking about how the civic tech sector could be accelerated and expanded, we looked to Purpose, a public benefit corporation that works with NGOs, philanthropies, and brands on movement building strategies. We wanted to explore what we might learn from taking the work that Purpose has done mapping the progress of of 21st century social movements and applying its methodology to civic tech.

So why consider viewing civic tech using the lens of 21st century movements? Movements are engines of change in society that enable citizens to create new and better paths to engage with government and to seek recourse on issues that matter to millions of people.  At first glance, civic tech doesn’t appear to be a movement in the purest sense of the term, but on closer inspection, it does share some fundamental characteristics. Like a movement, civic tech is mission driven, is focused on making change that benefits the public, and in most cases enables better public input into decision making.

We believe that better understanding the essential components of movements, and observing the ways in which civic tech does or does not behave like one, can yield insights on how we as a civic tech community can collectively drive the sector forward….

report Engines of Change: What Civic Tech Can Learn From Social Movements….provides a lot of rich insight and detail which we invite everyone to explore.  Meanwhile, we have summarized five key findings:

  1. Grassroots activity is expanding across the US – Activity is no longer centralized around San Francisco and New York; it’s rapidly growing and spreading across the US – in fact, there was an 81% increase in the number of cities hosting civic tech MeetUps from 2013 to 2015, and 45 of 50 states had at least one MeetUp on civic tech in 2015.
  2. Talk is turning to action – We are walking the talk. One way we can see this is that growth in civic tech Twitter discussion is highly correlated with the growth in GitHub contributions to civic tech projects and related Meetup events. Between 2013-2015, over 8,500 people contributed code to GitHub civic tech projects and there were over 76,000 MeetUps for civic tech events. 
  3. There is an engaged core, but it is very small in number – As with most social movements, civic tech has a definite core of highly engaged evangelists, advocates and entrepreneurs that are driving conversations, activity, and events and this is growing. The number of Meetup groups holding multiple events a quarter grew by 136% between 2013 to 2015. And likewise there was a 60% growth in Engaged Tweeters in during this time period.  However, this level of activity is dwarfed by other movements such as climate action.
  4. Civic tech is growing but still lacking scale – There are many positive indications of growth in civic tech; for example, the combination of nonprofit and for-profit funding to the sector increased by almost 120% over the period.  But while growth compares favorably to other movements, again the scale just isn’t there.
  5. Common themes, but no shared vision or identity – Purpose examined the extent to which civic tech exhibits and articulates a shared vision or identity around which members of a movement can rally. What they found is that many fewer people are discussing the same shared set of themes. Two themes – Open Data and Government Transparency – are resonating and gaining traction across the sector and could therefore form the basis of common identity for civic tech.

While each of these insights is important in its own right and requires action to move the sector forward, the main thing that strikes us is the need for a coherent and clearly articulated vision and sense of shared identity for civic tech…

Read the full report: Engines of Change: What Civic Tech Can Learn From Social Movements

Explore the data tool here….(More)”

Using Innovation and Technology to Improve City Services


IBM Center for the Business of Government: “In this report, Professor Greenberg examines a dozen cities across the United States that have award-winning reputations for using innovation and technology to improve the services they provide to their residents. She explores a variety of success factors associated with effective service delivery at the local level, including:

  • The policies, platforms, and applications that cities use for different purposes, such as public engagement, streamlining the issuance of permits, and emergency response
  • How cities can successfully partner with third parties, such as nonprofits, foundations, universities, and private businesses to improve service delivery using technology
  • The types of business cases that can be presented to mayors and city councils to support various changes proposed by innovators in city government

Professor Greenberg identifies a series of trends that drive cities to undertake innovations, such as the increased use of mobile devices by residents. Based on cities’ responses to these trends, she offers a set of findings and specific actions that city officials can act upon to create innovation agendas for their communities. Her report also presents case studies for each of the dozen cities in her review. These cases provide a real-world context, which will allow interested leaders in other cities to see how their own communities might approach similar innovation initiatives.

This report builds on two other IBM Center reports: A Guide for Making Innovation Offices Work, by Rachel Burstein and Alissa Black, and The Persistence of Innovation in Government: A Guide for Public Servants, by Sandford Borins, which examines the use of awards to stimulate innovation in government.

We hope that government leaders who are interested in innovations using technology to improve services will benefit from the governance models and tools described in this report, as they consider how best to leverage innovation and technology initiatives to serve residents more effectively and efficiently….(More)”

Estonia Is Demonstrating How Government Should Work in a Digital World


Motherboard: “In May, Manu Sporny became the 10,000th “e-Resident” of Estonia. Sporny, the founder and CEO of a digital payments and identity company located in the United States, has never set foot in Estonia. However, he heard about the country’s e-Residency program and decided it would be an obvious choice for his company’s European headquarters.

People like Sporny are why Estonia launched a digital residency program in December 2014. The program allows anyone in the world to apply for a digital identity, which will let them: establish and run a location independent business online, get easier access to EU markets, open a bank account and conduct e-banking, use international payment service providers, declare taxes, and sign all relevant documents and contracts remotely…..

One of the most essential components of a functioning digital society is a secure digital identity. The state and the private sector need to know who is accessing these online services. Likewise, users need to feel secure that their identity is protected.

Estonia found the solution to this problem. In 2002, we started issuing residents a mandatory ID-card with a chip that empowers them to categorically identify themselves and verify legal transactions and documents through a digital signature. A digital signature has been legally equivalent to a handwritten one throughout the European Union—not just in Estonia—since 1999.

With this new digital identity system, the state could serve not only areas with a low population, but also the entire Estonian diaspora. Estonians anywhere in the world could maintain a connection to their homeland via e-services, contribute to the legislative process, and even participate in elections. Once the government realized that it could scale this service worldwide, it seemed logical to offer its e-services to those without physical residency in Estonia. This meant the Estonian country suddenly had value as a service in addition to a place to live.

What does “Country as a Service” mean?

With the rise of a global internet, we’ve seen more skilled workers and businesspeople offering their services across nations, regardless of their physical location. A survey by Intuit estimates that this number will reach 40 percent in the US alone by 2020.

These entrepreneurs and skilled artisans are ultimately looking for the simplest way to create and maintain a legal, global identity as an outlet for their global offerings.

They look to other countries, not because they are looking for a tax haven, but because they have been prevented from incorporating and maintaining a business, due to barriers from their own government.

The most important thing for these entrepreneurs is that the creation and upkeep of the company is easy and hassle-free. It is also important that, despite being incorporated in a different nation, they remain honest taxpayers within their country of physical residence.

This is exactly what Estonia offers—a location-independent, hassle-free and fully-digital economic and financial environment where entrepreneurs can run their own company globally….

When an e-Resident establishes a company, it means that the company will likely start using the services offered by other Estonian companies (like creating a bank account, partnering with a payment service provider, seeking assistance from accountants, auditors and lawyers). As more clients are created for Estonian companies, their growth potential increases, along with the growth potential of the Estonian economy.

Eventually, there will be more residents outside borders than inside them

If states fail to redesign and simplify the machinery of bureaucracy and make it location-independent, there will be an opportunity for countries that can offer such services across borders.

Estonia has learned that it’s incredibly important in a small state to serve primarily small and micro businesses. In order to sustain a nation on this, we must automate and digitize processes to scale. Estonia’s model, for instance, is location-independent, making it simple to scale successfully. We hope to acquire at least 10 million digital residents (e-Residents) in a way that is mutually beneficial by the nation-states where these people are tax residents….(More)”

Connect the corporate dots to see true transparency


Gillian Tett at the Financial Times: “…In all this, a crucial point is often forgotten: simply amassing data will not solve the problem of transparency. What is also needed is a way for analysts to track the connections that exist between companies scattered across different national jurisdictions.

There are more than 45,000 companies listed on global stock exchanges and, according to Chris Taggart of OpenCorporates, an independent data company, there are between 250m and 400m unlisted groups. Many of these are listed on national registries but, since registries are extremely fragmented, it is very difficult for shareholders or regulators to form a complete picture of company activity.

It also creates financial stability risks. One reason why it is currently hard to track the scale of Chinese corporate debt, say, is that it is being issued by an opaque web of legal entities. Similarly, regulators struggled to cope with the fallout from the Lehman Brothers collapse in 2008 because the bank was operating almost 3,000 different legal entities around the world.

Is there a solution to this? A good place to start would be for governments to put their corporate registries online. Another crucial step would be for governments and companies to agree on a common standard for labelling legal entities, so that these can be tracked across borders.

Happily, work on that has begun: in 2014, the Global Legal Entity Identifier Foundation was created. It supports the implementation and use of “legal entity identifiers”, a data standard that identifies participants in financial transactions. Groups such as the Data Coalition in Washington DC are lobbying for laws that would force companies to use LEIs….However, this inter-governmental project is moving so slowly that the private sector may be a better bet. In recent years, companies such as Dun & Bradstreet have begun to amass proprietary information about complex corporate webs, and computer nerds are also starting to use the power of big data to join up the corporate dots in a public format.

OpenCorporates is a good example. Over the past five years, a dozen staff there have been painstakingly scraping national corporate registries to create a database designed to show how companies are connected around the world. This is far from complete but data from 100m entities have already been logged. And in the wake of the Panama Papers, more governments are coming on board — data from the Cayman Islands are currently being added and France is likely to collaborate soon.

Sadly, these moves will not deliver real transparency straight away. If you type “MIO” into the search box on the OpenCorporates website, you will not see a map of all of McKinsey’s activities — at least not yet.

The good news, however, is that with every data scrape, or use of an LEI, the picture of global corporate activity is becoming slightly less opaque thanks to the work of a hidden army of geeks. They deserve acclaim and support — even (or especially) from management consultants….(More)”

Selected Readings on Data Collaboratives


By Neil Britto, David Sangokoya, Iryna Susha, Stefaan Verhulst and Andrew Young

The Living Library’s Selected Readings series seeks to build a knowledge base on innovative approaches for improving the effectiveness and legitimacy of governance. This curated and annotated collection of recommended works on the topic of data collaboratives was originally published in 2017.

The term data collaborative refers to a new form of collaboration, beyond the public-private partnership model, in which participants from different sectors (including private companies, research institutions, and government agencies ) can exchange data to help solve public problems. Several of society’s greatest challenges — from addressing climate change to public health to job creation to improving the lives of children — require greater access to data, more collaboration between public – and private-sector entities, and an increased ability to analyze datasets. In the coming months and years, data collaboratives will be essential vehicles for harnessing the vast stores of privately held data toward the public good.

Selected Reading List (in alphabetical order)

Annotated Selected Readings List (in alphabetical order)

Agaba, G., Akindès, F., Bengtsson, L., Cowls, J., Ganesh, M., Hoffman, N., . . . Meissner, F. “Big Data and Positive Social Change in the Developing World: A White Paper for Practitioners and Researchers.” 2014. http://bit.ly/25RRC6N.

  • This white paper, produced by “a group of activists, researchers and data experts” explores the potential of big data to improve development outcomes and spur positive social change in low- and middle-income countries. Using examples, the authors discuss four areas in which the use of big data can impact development efforts:
    • Advocating and facilitating by “opening[ing] up new public spaces for discussion and awareness building;
    • Describing and predicting through the detection of “new correlations and the surfac[ing] of new questions;
    • Facilitating information exchange through “multiple feedback loops which feed into both research and action,” and
    • Promoting accountability and transparency, especially as a byproduct of crowdsourcing efforts aimed at “aggregat[ing] and analyz[ing] information in real time.
  • The authors argue that in order to maximize the potential of big data’s use in development, “there is a case to be made for building a data commons for private/public data, and for setting up new and more appropriate ethical guidelines.”
  • They also identify a number of challenges, especially when leveraging data made accessible from a number of sources, including private sector entities, such as:
    • Lack of general data literacy;
    • Lack of open learning environments and repositories;
    • Lack of resources, capacity and access;
    • Challenges of sensitivity and risk perception with regard to using data;
    • Storage and computing capacity; and
    • Externally validating data sources for comparison and verification.

Ansell, C. and Gash, A. “Collaborative Governance in Theory and Practice.” Journal of Public Administration Research and  Theory 18 (4), 2008. http://bit.ly/1RZgsI5.

  • This article describes collaborative arrangements that include public and private organizations working together and proposes a model for understanding an emergent form of public-private interaction informed by 137 diverse cases of collaborative governance.
  • The article suggests factors significant to successful partnering processes and outcomes include:
    • Shared understanding of challenges,
    • Trust building processes,
    • The importance of recognizing seemingly modest progress, and
    • Strong indicators of commitment to the partnership’s aspirations and process.
  • The authors provide a ‘’contingency theory model’’ that specifies relationships between different variables that influence outcomes of collaborative governance initiatives. Three “core contingencies’’ for successful collaborative governance initiatives identified by the authors are:
    • Time (e.g., decision making time afforded to the collaboration);
    • Interdependence (e.g., a high degree of interdependence can mitigate negative effects of low trust); and
    • Trust (e.g. a higher level of trust indicates a higher probability of success).

Ballivian A, Hoffman W. “Public-Private Partnerships for Data: Issues Paper for Data Revolution Consultation.” World Bank, 2015. Available from: http://bit.ly/1ENvmRJ

  • This World Bank report provides a background document on forming public-prviate partnerships for data with the private sector in order to inform the UN’s Independent Expert Advisory Group (IEAG) on sustaining a “data revolution” in sustainable development.
  • The report highlights the critical position of private companies within the data value chain and reflects on key elements of a sustainable data PPP: “common objectives across all impacted stakeholders, alignment of incentives, and sharing of risks.” In addition, the report describes the risks and incentives of public and private actors, and the principles needed to “build[ing] the legal, cultural, technological and economic infrastructures to enable the balancing of competing interests.” These principles include understanding; experimentation; adaptability; balance; persuasion and compulsion; risk management; and governance.
  • Examples of data collaboratives cited in the report include HP Earth Insights, Orange Data for Development Challenges, Amazon Web Services, IBM Smart Cities Initiative, and the Governance Lab’s Open Data 500.

Brack, Matthew, and Tito Castillo. “Data Sharing for Public Health: Key Lessons from Other Sectors.” Chatham House, Centre on Global Health Security. April 2015. Available from: http://bit.ly/1DHFGVl

  • The Chatham House report provides an overview on public health surveillance data sharing, highlighting the benefits and challenges of shared health data and the complexity in adapting technical solutions from other sectors for public health.
  • The report describes data sharing processes from several perspectives, including in-depth case studies of actual data sharing in practice at the individual, organizational and sector levels. Among the key lessons for public health data sharing, the report strongly highlights the need to harness momentum for action and maintain collaborative engagement: “Successful data sharing communities are highly collaborative. Collaboration holds the key to producing and abiding by community standards, and building and maintaining productive networks, and is by definition the essence of data sharing itself. Time should be invested in establishing and sustaining collaboration with all stakeholders concerned with public health surveillance data sharing.”
  • Examples of data collaboratives include H3Africa (a collaboration between NIH and Wellcome Trust) and NHS England’s care.data programme.

de Montjoye, Yves-Alexandre, Jake Kendall, and Cameron F. Kerry. “Enabling Humanitarian Use of Mobile Phone Data.” The Brookings Institution, Issues in Technology Innovation. November 2014. Available from: http://brook.gs/1JxVpxp

  • Using Ebola as a case study, the authors describe the value of using private telecom data for uncovering “valuable insights into understanding the spread of infectious diseases as well as strategies into micro-target outreach and driving update of health-seeking behavior.”
  • The authors highlight the absence of a common legal and standards framework for “sharing mobile phone data in privacy-conscientious ways” and recommend “engaging companies, NGOs, researchers, privacy experts, and governments to agree on a set of best practices for new privacy-conscientious metadata sharing models.”

Eckartz, Silja M., Hofman, Wout J., Van Veenstra, Anne Fleur. “A decision model for data sharing.” Vol. 8653 LNCS. Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics). 2014. http://bit.ly/21cGWfw.

  • This paper proposes a decision model for data sharing of public and private data based on literature review and three case studies in the logistics sector.
  • The authors identify five categories of the barriers to data sharing and offer a decision model for identifying potential interventions to overcome each barrier:
    • Ownership. Possible interventions likely require improving trust among those who own the data through, for example, involvement and support from higher management
    • Privacy. Interventions include “anonymization by filtering of sensitive information and aggregation of data,” and access control mechanisms built around identity management and regulated access.  
    • Economic. Interventions include a model where data is shared only with a few trusted organizations, and yield management mechanisms to ensure negative financial consequences are avoided.
    • Data quality. Interventions include identifying additional data sources that could improve the completeness of datasets, and efforts to improve metadata.
    • Technical. Interventions include making data available in structured formats and publishing data according to widely agreed upon data standards.

Hoffman, Sharona and Podgurski, Andy. “The Use and Misuse of Biomedical Data: Is Bigger Really Better?” American Journal of Law & Medicine 497, 2013. http://bit.ly/1syMS7J.

  • This journal articles explores the benefits and, in particular, the risks related to large-scale biomedical databases bringing together health information from a diversity of sources across sectors. Some data collaboratives examined in the piece include:
    • MedMining – a company that extracts EHR data, de-identifies it, and offers it to researchers. The data sets that MedMining delivers to its customers include ‘lab results, vital signs, medications, procedures, diagnoses, lifestyle data, and detailed costs’ from inpatient and outpatient facilities.
    • Explorys has formed a large healthcare database derived from financial, administrative, and medical records. It has partnered with major healthcare organizations such as the Cleveland Clinic Foundation and Summa Health System to aggregate and standardize health information from ten million patients and over thirty billion clinical events.
  • Hoffman and Podgurski note that biomedical databases populated have many potential uses, with those likely to benefit including: “researchers, regulators, public health officials, commercial entities, lawyers,” as well as “healthcare providers who conduct quality assessment and improvement activities,” regulatory monitoring entities like the FDA, and “litigants in tort cases to develop evidence concerning causation and harm.”
  • They argue, however, that risks arise based on:
    • The data contained in biomedical databases is surprisingly likely to be incorrect or incomplete;
    • Systemic biases, arising from both the nature of the data and the preconceptions of investigators are serious threats the validity of research results, especially in answering causal questions;
  • Data mining of biomedical databases makes it easier for individuals with political, social, or economic agendas to generate ostensibly scientific but misleading research findings for the purpose of manipulating public opinion and swaying policymakers.

Krumholz, Harlan M., et al. “Sea Change in Open Science and Data Sharing Leadership by Industry.” Circulation: Cardiovascular Quality and Outcomes 7.4. 2014. 499-504. http://1.usa.gov/1J6q7KJ

  • This article provides a comprehensive overview of industry-led efforts and cross-sector collaborations in data sharing by pharmaceutical companies to inform clinical practice.
  • The article details the types of data being shared and the early activities of GlaxoSmithKline (“in coordination with other companies such as Roche and ViiV”); Medtronic and the Yale University Open Data Access Project; and Janssen Pharmaceuticals (Johnson & Johnson). The article also describes the range of involvement in data sharing among pharmaceutical companies including Pfizer, Novartis, Bayer, AbbVie, Eli Llly, AstraZeneca, and Bristol-Myers Squibb.

Mann, Gideon. “Private Data and the Public Good.” Medium. May 17, 2016. http://bit.ly/1OgOY68.

    • This Medium post from Gideon Mann, the Head of Data Science at Bloomberg, shares his prepared remarks given at a lecture at the City College of New York. Mann argues for the potential benefits of increasing access to private sector data, both to improve research and academic inquiry and also to help solve practical, real-world problems. He also describes a number of initiatives underway at Bloomberg along these lines.    
  • Mann argues that data generated at private companies “could enable amazing discoveries and research,” but is often inaccessible to those who could put it to those uses. Beyond research, he notes that corporate data could, for instance, benefit:
      • Public health – including suicide prevention, addiction counseling and mental health monitoring.
    • Legal and ethical questions – especially as they relate to “the role algorithms have in decisions about our lives,” such as credit checks and resume screening.
  • Mann recognizes the privacy challenges inherent in private sector data sharing, but argues that it is a common misconception that the only two choices are “complete privacy or complete disclosure.” He believes that flexible frameworks for differential privacy could open up new opportunities for responsibly leveraging data collaboratives.

Pastor Escuredo, D., Morales-Guzmán, A. et al, “Flooding through the Lens of Mobile Phone Activity.” IEEE Global Humanitarian Technology Conference, GHTC 2014. Available from: http://bit.ly/1OzK2bK

  • This report describes the impact of using mobile data in order to understand the impact of disasters and improve disaster management. The report was conducted in the Mexican state of Tabasco in 2009 as a multidisciplinary, multi-stakeholder consortium involving the UN World Food Programme (WFP), Telefonica Research, Technical University of Madrid (UPM), Digital Strategy Coordination Office of the President of Mexico, and UN Global Pulse.
  • Telefonica Research, a division of the major Latin American telecommunications company, provided call detail records covering flood-affected areas for nine months. This data was combined with “remote sensing data (satellite images), rainfall data, census and civil protection data.” The results of the data demonstrated that “analysing mobile activity during floods could be used to potentially locate damaged areas, efficiently assess needs and allocate resources (for example, sending supplies to affected areas).”
  • In addition to the results, the study highlighted “the value of a public-private partnership on using mobile data to accurately indicate flooding impacts in Tabasco, thus improving early warning and crisis management.”

* Perkmann, M. and Schildt, H. “Open data partnerships between firms and universities: The role of boundary organizations.” Research Policy, 44(5), 2015. http://bit.ly/25RRJ2c

  • This paper discusses the concept of a “boundary organization” in relation to industry-academic partnerships driven by data. Boundary organizations perform mediated revealing, allowing firms to disclose their research problems to a broad audience of innovators and simultaneously minimize the risk that this information would be adversely used by competitors.
  • The authors identify two especially important challenges for private firms to enter open data or participate in data collaboratives with the academic research community that could be addressed through more involvement from boundary organizations:
    • First is a challenge of maintaining competitive advantage. The authors note that, “the more a firm attempts to align the efforts in an open data research programme with its R&D priorities, the more it will have to reveal about the problems it is addressing within its proprietary R&D.”
    • Second, involves the misalignment of incentives between the private and academic field. Perkmann and Schildt argue that, a firm seeking to build collaborations around its opened data “will have to provide suitable incentives that are aligned with academic scientists’ desire to be rewarded for their work within their respective communities.”

Robin, N., Klein, T., & Jütting, J. “Public-Private Partnerships for Statistics: Lessons Learned, Future Steps.” OECD. 2016. http://bit.ly/24FLYlD.

  • This working paper acknowledges the growing body of work on how different types of data (e.g, telecom data, social media, sensors and geospatial data, etc.) can address data gaps relevant to National Statistical Offices (NSOs).
  • Four models of public-private interaction for statistics are describe: in-house production of statistics by a data-provider for a national statistics office (NSO), transfer of data-sets to NSOs from private entities, transfer of data to a third party provider to manage the NSO and private entity data, and the outsourcing of NSO functions.
  • The paper highlights challenges to public-private partnerships involving data (e.g., technical challenges, data confidentiality, risks, limited incentives for participation), suggests deliberate and highly structured approaches to public-private partnerships involving data require enforceable contracts, emphasizes the trade-off between data specificity and accessibility of such data, and the importance of pricing mechanisms that reflect the capacity and capability of national statistic offices.
  • Case studies referenced in the paper include:
    • A mobile network operator’s (MNO Telefonica) in house analysis of call detail records;
    • A third-party data provider and steward of travel statistics (Positium);
    • The Data for Development (D4D) challenge organized by MNO Orange; and
    • Statistics Netherlands use of social media to predict consumer confidence.

Stuart, Elizabeth, Samman, Emma, Avis, William, Berliner, Tom. “The data revolution: finding the missing millions.” Overseas Development Institute, 2015. Available from: http://bit.ly/1bPKOjw

  • The authors of this report highlight the need for good quality, relevant, accessible and timely data for governments to extend services into underrepresented communities and implement policies towards a sustainable “data revolution.”
  • The solutions focused on this recent report from the Overseas Development Institute focus on capacity-building activities of national statistical offices (NSOs), alternative sources of data (including shared corporate data) to address gaps, and building strong data management systems.

Taylor, L., & Schroeder, R. “Is bigger better? The emergence of big data as a tool for international development policy.” GeoJournal, 80(4). 2015. 503-518. http://bit.ly/1RZgSy4.

  • This journal article describes how privately held data – namely “digital traces” of consumer activity – “are becoming seen by policymakers and researchers as a potential solution to the lack of reliable statistical data on lower-income countries.
  • They focus especially on three categories of data collaborative use cases:
    • Mobile data as a predictive tool for issues such as human mobility and economic activity;
    • Use of mobile data to inform humanitarian response to crises; and
    • Use of born-digital web data as a tool for predicting economic trends, and the implications these have for LMICs.
  • They note, however, that a number of challenges and drawbacks exist for these types of use cases, including:
    • Access to private data sources often must be negotiated or bought, “which potentially means substituting negotiations with corporations for those with national statistical offices;”
    • The meaning of such data is not always simple or stable, and local knowledge is needed to understand how people are using the technologies in question
    • Bias in proprietary data can be hard to understand and quantify;
    • Lack of privacy frameworks; and
    • Power asymmetries, wherein “LMIC citizens are unwittingly placed in a panopticon staffed by international researchers, with no way out and no legal recourse.”

van Panhuis, Willem G., Proma Paul, Claudia Emerson, John Grefenstette, Richard Wilder, Abraham J. Herbst, David Heymann, and Donald S. Burke. “A systematic review of barriers to data sharing in public health.” BMC public health 14, no. 1 (2014): 1144. Available from: http://bit.ly/1JOBruO

  • The authors of this report provide a “systematic literature of potential barriers to public health data sharing.” These twenty potential barriers are classified in six categories: “technical, motivational, economic, political, legal and ethical.” In this taxonomy, “the first three categories are deeply rooted in well-known challenges of health information systems for which structural solutions have yet to be found; the last three have solutions that lie in an international dialogue aimed at generating consensus on policies and instruments for data sharing.”
  • The authors suggest the need for a “systematic framework of barriers to data sharing in public health” in order to accelerate access and use of data for public good.

Verhulst, Stefaan and Sangokoya, David. “Mapping the Next Frontier of Open Data: Corporate Data Sharing.” In: Gasser, Urs and Zittrain, Jonathan and Faris, Robert and Heacock Jones, Rebekah, “Internet Monitor 2014: Reflections on the Digital World: Platforms, Policy, Privacy, and Public Discourse (December 15, 2014).” Berkman Center Research Publication No. 2014-17. http://bit.ly/1GC12a2

  • This essay describe a taxonomy of current corporate data sharing practices for public good: research partnerships; prizes and challenges; trusted intermediaries; application programming interfaces (APIs); intelligence products; and corporate data cooperatives or pooling.
  • Examples of data collaboratives include: Yelp Dataset Challenge, the Digital Ecologies Research Partnerhsip, BBVA Innova Challenge, Telecom Italia’s Big Data Challenge, NIH’s Accelerating Medicines Partnership and the White House’s Climate Data Partnerships.
  • The authors highlight important questions to consider towards a more comprehensive mapping of these activities.

Verhulst, Stefaan and Sangokoya, David, 2015. “Data Collaboratives: Exchanging Data to Improve People’s Lives.” Medium. Available from: http://bit.ly/1JOBDdy

  • The essay refers to data collaboratives as a new form of collaboration involving participants from different sectors exchanging data to help solve public problems. These forms of collaborations can improve people’s lives through data-driven decision-making; information exchange and coordination; and shared standards and frameworks for multi-actor, multi-sector participation.
  • The essay cites four activities that are critical to accelerating data collaboratives: documenting value and measuring impact; matching public demand and corporate supply of data in a trusted way; training and convening data providers and users; experimenting and scaling existing initiatives.
  • Examples of data collaboratives include NIH’s Precision Medicine Initiative; the Mobile Data, Environmental Extremes and Population (MDEEP) Project; and Twitter-MIT’s Laboratory for Social Machines.

Verhulst, Stefaan, Susha, Iryna, Kostura, Alexander. “Data Collaboratives: matching Supply of (Corporate) Data to Solve Public Problems.” Medium. February 24, 2016. http://bit.ly/1ZEp2Sr.

  • This piece articulates a set of key lessons learned during a session at the International Data Responsibility Conference focused on identifying emerging practices, opportunities and challenges confronting data collaboratives.
  • The authors list a number of privately held data sources that could create positive public impacts if made more accessible in a collaborative manner, including:
    • Data for early warning systems to help mitigate the effects of natural disasters;
    • Data to help understand human behavior as it relates to nutrition and livelihoods in developing countries;
    • Data to monitor compliance with weapons treaties;
    • Data to more accurately measure progress related to the UN Sustainable Development Goals.
  • To the end of identifying and expanding on emerging practice in the space, the authors describe a number of current data collaborative experiments, including:
    • Trusted Intermediaries: Statistics Netherlands partnered with Vodafone to analyze mobile call data records in order to better understand mobility patterns and inform urban planning.
    • Prizes and Challenges: Orange Telecom, which has been a leader in this type of Data Collaboration, provided several examples of the company’s initiatives, such as the use of call data records to track the spread of malaria as well as their experience with Challenge 4 Development.
    • Research partnerships: The Data for Climate Action project is an ongoing large-scale initiative incentivizing companies to share their data to help researchers answer particular scientific questions related to climate change and adaptation.
    • Sharing intelligence products: JPMorgan Chase shares macro economic insights they gained leveraging their data through the newly established JPMorgan Chase Institute.
  • In order to capitalize on the opportunities provided by data collaboratives, a number of needs were identified:
    • A responsible data framework;
    • Increased insight into different business models that may facilitate the sharing of data;
    • Capacity to tap into the potential value of data;
    • Transparent stock of available data supply; and
    • Mapping emerging practices and models of sharing.

Vogel, N., Theisen, C., Leidig, J. P., Scripps, J., Graham, D. H., & Wolffe, G. “Mining mobile datasets to enable the fine-grained stochastic simulation of Ebola diffusion.” Paper presented at the Procedia Computer Science. 2015. http://bit.ly/1TZDroF.

  • The paper presents a research study conducted on the basis of the mobile calls records shared with researchers in the framework of the Data for Development Challenge by the mobile operator Orange.
  • The study discusses the data analysis approach in relation to developing a situation of Ebola diffusion built around “the interactions of multi-scale models, including viral loads (at the cellular level), disease progression (at the individual person level), disease propagation (at the workplace and family level), societal changes in migration and travel movements (at the population level), and mitigating interventions (at the abstract government policy level).”
  • The authors argue that the use of their population, mobility, and simulation models provide more accurate simulation details in comparison to high-level analytical predictions and that the D4D mobile datasets provide high-resolution information useful for modeling developing regions and hard to reach locations.

Welle Donker, F., van Loenen, B., & Bregt, A. K. “Open Data and Beyond.” ISPRS International Journal of Geo-Information, 5(4). 2016. http://bit.ly/22YtugY.

  • This research has developed a monitoring framework to assess the effects of open (private) data using a case study of a Dutch energy network administrator Liander.
  • Focusing on the potential impacts of open private energy data – beyond ‘smart disclosure’ where citizens are given information only about their own energy usage – the authors identify three attainable strategic goals:
    • Continuously optimize performance on services, security of supply, and costs;
    • Improve management of energy flows and insight into energy consumption;
    • Help customers save energy and switch over to renewable energy sources.
  • The authors propose a seven-step framework for assessing the impacts of Liander data, in particular, and open private data more generally:
    • Develop a performance framework to describe what the program is about, description of the organization’s mission and strategic goals;
    • Identify the most important elements, or key performance areas which are most critical to understanding and assessing your program’s success;
    • Select the most appropriate performance measures;
    • Determine the gaps between what information you need and what is available;
    • Develop and implement a measurement strategy to address the gaps;
    • Develop a performance report which highlights what you have accomplished and what you have learned;
    • Learn from your experiences and refine your approach as required.
  • While the authors note that the true impacts of this open private data will likely not come into view in the short term, they argue that, “Liander has successfully demonstrated that private energy companies can release open data, and has successfully championed the other Dutch network administrators to follow suit.”

World Economic Forum, 2015. “Data-driven development: pathways for progress.” Geneva: World Economic Forum. http://bit.ly/1JOBS8u

  • This report captures an overview of the existing data deficit and the value and impact of big data for sustainable development.
  • The authors of the report focus on four main priorities towards a sustainable data revolution: commercial incentives and trusted agreements with public- and private-sector actors; the development of shared policy frameworks, legal protections and impact assessments; capacity building activities at the institutional, community, local and individual level; and lastly, recognizing individuals as both produces and consumers of data.