Selected Readings on Data Collaboratives


By Neil Britto, David Sangokoya, Iryna Susha, Stefaan Verhulst and Andrew Young

The Living Library’s Selected Readings series seeks to build a knowledge base on innovative approaches for improving the effectiveness and legitimacy of governance. This curated and annotated collection of recommended works on the topic of data collaboratives was originally published in 2017.

The term data collaborative refers to a new form of collaboration, beyond the public-private partnership model, in which participants from different sectors (including private companies, research institutions, and government agencies ) can exchange data to help solve public problems. Several of society’s greatest challenges — from addressing climate change to public health to job creation to improving the lives of children — require greater access to data, more collaboration between public – and private-sector entities, and an increased ability to analyze datasets. In the coming months and years, data collaboratives will be essential vehicles for harnessing the vast stores of privately held data toward the public good.

Selected Reading List (in alphabetical order)

Annotated Selected Readings List (in alphabetical order)

Agaba, G., Akindès, F., Bengtsson, L., Cowls, J., Ganesh, M., Hoffman, N., . . . Meissner, F. “Big Data and Positive Social Change in the Developing World: A White Paper for Practitioners and Researchers.” 2014. http://bit.ly/25RRC6N.

  • This white paper, produced by “a group of activists, researchers and data experts” explores the potential of big data to improve development outcomes and spur positive social change in low- and middle-income countries. Using examples, the authors discuss four areas in which the use of big data can impact development efforts:
    • Advocating and facilitating by “opening[ing] up new public spaces for discussion and awareness building;
    • Describing and predicting through the detection of “new correlations and the surfac[ing] of new questions;
    • Facilitating information exchange through “multiple feedback loops which feed into both research and action,” and
    • Promoting accountability and transparency, especially as a byproduct of crowdsourcing efforts aimed at “aggregat[ing] and analyz[ing] information in real time.
  • The authors argue that in order to maximize the potential of big data’s use in development, “there is a case to be made for building a data commons for private/public data, and for setting up new and more appropriate ethical guidelines.”
  • They also identify a number of challenges, especially when leveraging data made accessible from a number of sources, including private sector entities, such as:
    • Lack of general data literacy;
    • Lack of open learning environments and repositories;
    • Lack of resources, capacity and access;
    • Challenges of sensitivity and risk perception with regard to using data;
    • Storage and computing capacity; and
    • Externally validating data sources for comparison and verification.

Ansell, C. and Gash, A. “Collaborative Governance in Theory and Practice.” Journal of Public Administration Research and  Theory 18 (4), 2008. http://bit.ly/1RZgsI5.

  • This article describes collaborative arrangements that include public and private organizations working together and proposes a model for understanding an emergent form of public-private interaction informed by 137 diverse cases of collaborative governance.
  • The article suggests factors significant to successful partnering processes and outcomes include:
    • Shared understanding of challenges,
    • Trust building processes,
    • The importance of recognizing seemingly modest progress, and
    • Strong indicators of commitment to the partnership’s aspirations and process.
  • The authors provide a ‘’contingency theory model’’ that specifies relationships between different variables that influence outcomes of collaborative governance initiatives. Three “core contingencies’’ for successful collaborative governance initiatives identified by the authors are:
    • Time (e.g., decision making time afforded to the collaboration);
    • Interdependence (e.g., a high degree of interdependence can mitigate negative effects of low trust); and
    • Trust (e.g. a higher level of trust indicates a higher probability of success).

Ballivian A, Hoffman W. “Public-Private Partnerships for Data: Issues Paper for Data Revolution Consultation.” World Bank, 2015. Available from: http://bit.ly/1ENvmRJ

  • This World Bank report provides a background document on forming public-prviate partnerships for data with the private sector in order to inform the UN’s Independent Expert Advisory Group (IEAG) on sustaining a “data revolution” in sustainable development.
  • The report highlights the critical position of private companies within the data value chain and reflects on key elements of a sustainable data PPP: “common objectives across all impacted stakeholders, alignment of incentives, and sharing of risks.” In addition, the report describes the risks and incentives of public and private actors, and the principles needed to “build[ing] the legal, cultural, technological and economic infrastructures to enable the balancing of competing interests.” These principles include understanding; experimentation; adaptability; balance; persuasion and compulsion; risk management; and governance.
  • Examples of data collaboratives cited in the report include HP Earth Insights, Orange Data for Development Challenges, Amazon Web Services, IBM Smart Cities Initiative, and the Governance Lab’s Open Data 500.

Brack, Matthew, and Tito Castillo. “Data Sharing for Public Health: Key Lessons from Other Sectors.” Chatham House, Centre on Global Health Security. April 2015. Available from: http://bit.ly/1DHFGVl

  • The Chatham House report provides an overview on public health surveillance data sharing, highlighting the benefits and challenges of shared health data and the complexity in adapting technical solutions from other sectors for public health.
  • The report describes data sharing processes from several perspectives, including in-depth case studies of actual data sharing in practice at the individual, organizational and sector levels. Among the key lessons for public health data sharing, the report strongly highlights the need to harness momentum for action and maintain collaborative engagement: “Successful data sharing communities are highly collaborative. Collaboration holds the key to producing and abiding by community standards, and building and maintaining productive networks, and is by definition the essence of data sharing itself. Time should be invested in establishing and sustaining collaboration with all stakeholders concerned with public health surveillance data sharing.”
  • Examples of data collaboratives include H3Africa (a collaboration between NIH and Wellcome Trust) and NHS England’s care.data programme.

de Montjoye, Yves-Alexandre, Jake Kendall, and Cameron F. Kerry. “Enabling Humanitarian Use of Mobile Phone Data.” The Brookings Institution, Issues in Technology Innovation. November 2014. Available from: http://brook.gs/1JxVpxp

  • Using Ebola as a case study, the authors describe the value of using private telecom data for uncovering “valuable insights into understanding the spread of infectious diseases as well as strategies into micro-target outreach and driving update of health-seeking behavior.”
  • The authors highlight the absence of a common legal and standards framework for “sharing mobile phone data in privacy-conscientious ways” and recommend “engaging companies, NGOs, researchers, privacy experts, and governments to agree on a set of best practices for new privacy-conscientious metadata sharing models.”

Eckartz, Silja M., Hofman, Wout J., Van Veenstra, Anne Fleur. “A decision model for data sharing.” Vol. 8653 LNCS. Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics). 2014. http://bit.ly/21cGWfw.

  • This paper proposes a decision model for data sharing of public and private data based on literature review and three case studies in the logistics sector.
  • The authors identify five categories of the barriers to data sharing and offer a decision model for identifying potential interventions to overcome each barrier:
    • Ownership. Possible interventions likely require improving trust among those who own the data through, for example, involvement and support from higher management
    • Privacy. Interventions include “anonymization by filtering of sensitive information and aggregation of data,” and access control mechanisms built around identity management and regulated access.  
    • Economic. Interventions include a model where data is shared only with a few trusted organizations, and yield management mechanisms to ensure negative financial consequences are avoided.
    • Data quality. Interventions include identifying additional data sources that could improve the completeness of datasets, and efforts to improve metadata.
    • Technical. Interventions include making data available in structured formats and publishing data according to widely agreed upon data standards.

Hoffman, Sharona and Podgurski, Andy. “The Use and Misuse of Biomedical Data: Is Bigger Really Better?” American Journal of Law & Medicine 497, 2013. http://bit.ly/1syMS7J.

  • This journal articles explores the benefits and, in particular, the risks related to large-scale biomedical databases bringing together health information from a diversity of sources across sectors. Some data collaboratives examined in the piece include:
    • MedMining – a company that extracts EHR data, de-identifies it, and offers it to researchers. The data sets that MedMining delivers to its customers include ‘lab results, vital signs, medications, procedures, diagnoses, lifestyle data, and detailed costs’ from inpatient and outpatient facilities.
    • Explorys has formed a large healthcare database derived from financial, administrative, and medical records. It has partnered with major healthcare organizations such as the Cleveland Clinic Foundation and Summa Health System to aggregate and standardize health information from ten million patients and over thirty billion clinical events.
  • Hoffman and Podgurski note that biomedical databases populated have many potential uses, with those likely to benefit including: “researchers, regulators, public health officials, commercial entities, lawyers,” as well as “healthcare providers who conduct quality assessment and improvement activities,” regulatory monitoring entities like the FDA, and “litigants in tort cases to develop evidence concerning causation and harm.”
  • They argue, however, that risks arise based on:
    • The data contained in biomedical databases is surprisingly likely to be incorrect or incomplete;
    • Systemic biases, arising from both the nature of the data and the preconceptions of investigators are serious threats the validity of research results, especially in answering causal questions;
  • Data mining of biomedical databases makes it easier for individuals with political, social, or economic agendas to generate ostensibly scientific but misleading research findings for the purpose of manipulating public opinion and swaying policymakers.

Krumholz, Harlan M., et al. “Sea Change in Open Science and Data Sharing Leadership by Industry.” Circulation: Cardiovascular Quality and Outcomes 7.4. 2014. 499-504. http://1.usa.gov/1J6q7KJ

  • This article provides a comprehensive overview of industry-led efforts and cross-sector collaborations in data sharing by pharmaceutical companies to inform clinical practice.
  • The article details the types of data being shared and the early activities of GlaxoSmithKline (“in coordination with other companies such as Roche and ViiV”); Medtronic and the Yale University Open Data Access Project; and Janssen Pharmaceuticals (Johnson & Johnson). The article also describes the range of involvement in data sharing among pharmaceutical companies including Pfizer, Novartis, Bayer, AbbVie, Eli Llly, AstraZeneca, and Bristol-Myers Squibb.

Mann, Gideon. “Private Data and the Public Good.” Medium. May 17, 2016. http://bit.ly/1OgOY68.

    • This Medium post from Gideon Mann, the Head of Data Science at Bloomberg, shares his prepared remarks given at a lecture at the City College of New York. Mann argues for the potential benefits of increasing access to private sector data, both to improve research and academic inquiry and also to help solve practical, real-world problems. He also describes a number of initiatives underway at Bloomberg along these lines.    
  • Mann argues that data generated at private companies “could enable amazing discoveries and research,” but is often inaccessible to those who could put it to those uses. Beyond research, he notes that corporate data could, for instance, benefit:
      • Public health – including suicide prevention, addiction counseling and mental health monitoring.
    • Legal and ethical questions – especially as they relate to “the role algorithms have in decisions about our lives,” such as credit checks and resume screening.
  • Mann recognizes the privacy challenges inherent in private sector data sharing, but argues that it is a common misconception that the only two choices are “complete privacy or complete disclosure.” He believes that flexible frameworks for differential privacy could open up new opportunities for responsibly leveraging data collaboratives.

Pastor Escuredo, D., Morales-Guzmán, A. et al, “Flooding through the Lens of Mobile Phone Activity.” IEEE Global Humanitarian Technology Conference, GHTC 2014. Available from: http://bit.ly/1OzK2bK

  • This report describes the impact of using mobile data in order to understand the impact of disasters and improve disaster management. The report was conducted in the Mexican state of Tabasco in 2009 as a multidisciplinary, multi-stakeholder consortium involving the UN World Food Programme (WFP), Telefonica Research, Technical University of Madrid (UPM), Digital Strategy Coordination Office of the President of Mexico, and UN Global Pulse.
  • Telefonica Research, a division of the major Latin American telecommunications company, provided call detail records covering flood-affected areas for nine months. This data was combined with “remote sensing data (satellite images), rainfall data, census and civil protection data.” The results of the data demonstrated that “analysing mobile activity during floods could be used to potentially locate damaged areas, efficiently assess needs and allocate resources (for example, sending supplies to affected areas).”
  • In addition to the results, the study highlighted “the value of a public-private partnership on using mobile data to accurately indicate flooding impacts in Tabasco, thus improving early warning and crisis management.”

* Perkmann, M. and Schildt, H. “Open data partnerships between firms and universities: The role of boundary organizations.” Research Policy, 44(5), 2015. http://bit.ly/25RRJ2c

  • This paper discusses the concept of a “boundary organization” in relation to industry-academic partnerships driven by data. Boundary organizations perform mediated revealing, allowing firms to disclose their research problems to a broad audience of innovators and simultaneously minimize the risk that this information would be adversely used by competitors.
  • The authors identify two especially important challenges for private firms to enter open data or participate in data collaboratives with the academic research community that could be addressed through more involvement from boundary organizations:
    • First is a challenge of maintaining competitive advantage. The authors note that, “the more a firm attempts to align the efforts in an open data research programme with its R&D priorities, the more it will have to reveal about the problems it is addressing within its proprietary R&D.”
    • Second, involves the misalignment of incentives between the private and academic field. Perkmann and Schildt argue that, a firm seeking to build collaborations around its opened data “will have to provide suitable incentives that are aligned with academic scientists’ desire to be rewarded for their work within their respective communities.”

Robin, N., Klein, T., & Jütting, J. “Public-Private Partnerships for Statistics: Lessons Learned, Future Steps.” OECD. 2016. http://bit.ly/24FLYlD.

  • This working paper acknowledges the growing body of work on how different types of data (e.g, telecom data, social media, sensors and geospatial data, etc.) can address data gaps relevant to National Statistical Offices (NSOs).
  • Four models of public-private interaction for statistics are describe: in-house production of statistics by a data-provider for a national statistics office (NSO), transfer of data-sets to NSOs from private entities, transfer of data to a third party provider to manage the NSO and private entity data, and the outsourcing of NSO functions.
  • The paper highlights challenges to public-private partnerships involving data (e.g., technical challenges, data confidentiality, risks, limited incentives for participation), suggests deliberate and highly structured approaches to public-private partnerships involving data require enforceable contracts, emphasizes the trade-off between data specificity and accessibility of such data, and the importance of pricing mechanisms that reflect the capacity and capability of national statistic offices.
  • Case studies referenced in the paper include:
    • A mobile network operator’s (MNO Telefonica) in house analysis of call detail records;
    • A third-party data provider and steward of travel statistics (Positium);
    • The Data for Development (D4D) challenge organized by MNO Orange; and
    • Statistics Netherlands use of social media to predict consumer confidence.

Stuart, Elizabeth, Samman, Emma, Avis, William, Berliner, Tom. “The data revolution: finding the missing millions.” Overseas Development Institute, 2015. Available from: http://bit.ly/1bPKOjw

  • The authors of this report highlight the need for good quality, relevant, accessible and timely data for governments to extend services into underrepresented communities and implement policies towards a sustainable “data revolution.”
  • The solutions focused on this recent report from the Overseas Development Institute focus on capacity-building activities of national statistical offices (NSOs), alternative sources of data (including shared corporate data) to address gaps, and building strong data management systems.

Taylor, L., & Schroeder, R. “Is bigger better? The emergence of big data as a tool for international development policy.” GeoJournal, 80(4). 2015. 503-518. http://bit.ly/1RZgSy4.

  • This journal article describes how privately held data – namely “digital traces” of consumer activity – “are becoming seen by policymakers and researchers as a potential solution to the lack of reliable statistical data on lower-income countries.
  • They focus especially on three categories of data collaborative use cases:
    • Mobile data as a predictive tool for issues such as human mobility and economic activity;
    • Use of mobile data to inform humanitarian response to crises; and
    • Use of born-digital web data as a tool for predicting economic trends, and the implications these have for LMICs.
  • They note, however, that a number of challenges and drawbacks exist for these types of use cases, including:
    • Access to private data sources often must be negotiated or bought, “which potentially means substituting negotiations with corporations for those with national statistical offices;”
    • The meaning of such data is not always simple or stable, and local knowledge is needed to understand how people are using the technologies in question
    • Bias in proprietary data can be hard to understand and quantify;
    • Lack of privacy frameworks; and
    • Power asymmetries, wherein “LMIC citizens are unwittingly placed in a panopticon staffed by international researchers, with no way out and no legal recourse.”

van Panhuis, Willem G., Proma Paul, Claudia Emerson, John Grefenstette, Richard Wilder, Abraham J. Herbst, David Heymann, and Donald S. Burke. “A systematic review of barriers to data sharing in public health.” BMC public health 14, no. 1 (2014): 1144. Available from: http://bit.ly/1JOBruO

  • The authors of this report provide a “systematic literature of potential barriers to public health data sharing.” These twenty potential barriers are classified in six categories: “technical, motivational, economic, political, legal and ethical.” In this taxonomy, “the first three categories are deeply rooted in well-known challenges of health information systems for which structural solutions have yet to be found; the last three have solutions that lie in an international dialogue aimed at generating consensus on policies and instruments for data sharing.”
  • The authors suggest the need for a “systematic framework of barriers to data sharing in public health” in order to accelerate access and use of data for public good.

Verhulst, Stefaan and Sangokoya, David. “Mapping the Next Frontier of Open Data: Corporate Data Sharing.” In: Gasser, Urs and Zittrain, Jonathan and Faris, Robert and Heacock Jones, Rebekah, “Internet Monitor 2014: Reflections on the Digital World: Platforms, Policy, Privacy, and Public Discourse (December 15, 2014).” Berkman Center Research Publication No. 2014-17. http://bit.ly/1GC12a2

  • This essay describe a taxonomy of current corporate data sharing practices for public good: research partnerships; prizes and challenges; trusted intermediaries; application programming interfaces (APIs); intelligence products; and corporate data cooperatives or pooling.
  • Examples of data collaboratives include: Yelp Dataset Challenge, the Digital Ecologies Research Partnerhsip, BBVA Innova Challenge, Telecom Italia’s Big Data Challenge, NIH’s Accelerating Medicines Partnership and the White House’s Climate Data Partnerships.
  • The authors highlight important questions to consider towards a more comprehensive mapping of these activities.

Verhulst, Stefaan and Sangokoya, David, 2015. “Data Collaboratives: Exchanging Data to Improve People’s Lives.” Medium. Available from: http://bit.ly/1JOBDdy

  • The essay refers to data collaboratives as a new form of collaboration involving participants from different sectors exchanging data to help solve public problems. These forms of collaborations can improve people’s lives through data-driven decision-making; information exchange and coordination; and shared standards and frameworks for multi-actor, multi-sector participation.
  • The essay cites four activities that are critical to accelerating data collaboratives: documenting value and measuring impact; matching public demand and corporate supply of data in a trusted way; training and convening data providers and users; experimenting and scaling existing initiatives.
  • Examples of data collaboratives include NIH’s Precision Medicine Initiative; the Mobile Data, Environmental Extremes and Population (MDEEP) Project; and Twitter-MIT’s Laboratory for Social Machines.

Verhulst, Stefaan, Susha, Iryna, Kostura, Alexander. “Data Collaboratives: matching Supply of (Corporate) Data to Solve Public Problems.” Medium. February 24, 2016. http://bit.ly/1ZEp2Sr.

  • This piece articulates a set of key lessons learned during a session at the International Data Responsibility Conference focused on identifying emerging practices, opportunities and challenges confronting data collaboratives.
  • The authors list a number of privately held data sources that could create positive public impacts if made more accessible in a collaborative manner, including:
    • Data for early warning systems to help mitigate the effects of natural disasters;
    • Data to help understand human behavior as it relates to nutrition and livelihoods in developing countries;
    • Data to monitor compliance with weapons treaties;
    • Data to more accurately measure progress related to the UN Sustainable Development Goals.
  • To the end of identifying and expanding on emerging practice in the space, the authors describe a number of current data collaborative experiments, including:
    • Trusted Intermediaries: Statistics Netherlands partnered with Vodafone to analyze mobile call data records in order to better understand mobility patterns and inform urban planning.
    • Prizes and Challenges: Orange Telecom, which has been a leader in this type of Data Collaboration, provided several examples of the company’s initiatives, such as the use of call data records to track the spread of malaria as well as their experience with Challenge 4 Development.
    • Research partnerships: The Data for Climate Action project is an ongoing large-scale initiative incentivizing companies to share their data to help researchers answer particular scientific questions related to climate change and adaptation.
    • Sharing intelligence products: JPMorgan Chase shares macro economic insights they gained leveraging their data through the newly established JPMorgan Chase Institute.
  • In order to capitalize on the opportunities provided by data collaboratives, a number of needs were identified:
    • A responsible data framework;
    • Increased insight into different business models that may facilitate the sharing of data;
    • Capacity to tap into the potential value of data;
    • Transparent stock of available data supply; and
    • Mapping emerging practices and models of sharing.

Vogel, N., Theisen, C., Leidig, J. P., Scripps, J., Graham, D. H., & Wolffe, G. “Mining mobile datasets to enable the fine-grained stochastic simulation of Ebola diffusion.” Paper presented at the Procedia Computer Science. 2015. http://bit.ly/1TZDroF.

  • The paper presents a research study conducted on the basis of the mobile calls records shared with researchers in the framework of the Data for Development Challenge by the mobile operator Orange.
  • The study discusses the data analysis approach in relation to developing a situation of Ebola diffusion built around “the interactions of multi-scale models, including viral loads (at the cellular level), disease progression (at the individual person level), disease propagation (at the workplace and family level), societal changes in migration and travel movements (at the population level), and mitigating interventions (at the abstract government policy level).”
  • The authors argue that the use of their population, mobility, and simulation models provide more accurate simulation details in comparison to high-level analytical predictions and that the D4D mobile datasets provide high-resolution information useful for modeling developing regions and hard to reach locations.

Welle Donker, F., van Loenen, B., & Bregt, A. K. “Open Data and Beyond.” ISPRS International Journal of Geo-Information, 5(4). 2016. http://bit.ly/22YtugY.

  • This research has developed a monitoring framework to assess the effects of open (private) data using a case study of a Dutch energy network administrator Liander.
  • Focusing on the potential impacts of open private energy data – beyond ‘smart disclosure’ where citizens are given information only about their own energy usage – the authors identify three attainable strategic goals:
    • Continuously optimize performance on services, security of supply, and costs;
    • Improve management of energy flows and insight into energy consumption;
    • Help customers save energy and switch over to renewable energy sources.
  • The authors propose a seven-step framework for assessing the impacts of Liander data, in particular, and open private data more generally:
    • Develop a performance framework to describe what the program is about, description of the organization’s mission and strategic goals;
    • Identify the most important elements, or key performance areas which are most critical to understanding and assessing your program’s success;
    • Select the most appropriate performance measures;
    • Determine the gaps between what information you need and what is available;
    • Develop and implement a measurement strategy to address the gaps;
    • Develop a performance report which highlights what you have accomplished and what you have learned;
    • Learn from your experiences and refine your approach as required.
  • While the authors note that the true impacts of this open private data will likely not come into view in the short term, they argue that, “Liander has successfully demonstrated that private energy companies can release open data, and has successfully championed the other Dutch network administrators to follow suit.”

World Economic Forum, 2015. “Data-driven development: pathways for progress.” Geneva: World Economic Forum. http://bit.ly/1JOBS8u

  • This report captures an overview of the existing data deficit and the value and impact of big data for sustainable development.
  • The authors of the report focus on four main priorities towards a sustainable data revolution: commercial incentives and trusted agreements with public- and private-sector actors; the development of shared policy frameworks, legal protections and impact assessments; capacity building activities at the institutional, community, local and individual level; and lastly, recognizing individuals as both produces and consumers of data.

Refugees and the Technology of Exile


David Lepeska in Wilson Quaterly: “While working for a Turkish tech firm, Akil learned how to program for mobile phones, and decided to make a smartphone app to help Syrians get all the information they need to build new lives in Turkey. In early 2014, he and a friend launched Gherbtna, named for an Arabic word referring to the loneliness of foreign exile….

About one-tenth of the 2.7 million Syrians in Turkey live in refugee camps. The rest fend for themselves, mostly in big cities. Now that they look set to stay in Turkey for some time, their need to settle and build stable, secure lives is much more acute. This may explain why downloads of Gherbtna more than doubled in the past six months. “We started this project to help people, and when we have reached all Syrian refugees, to help them find jobs, housing, whatever they need to build a new life in Turkey, then we have achieved our goal,” said Akil. “Our ultimate dream for Gherbtna is to reach all refugees around the world, and help them.”

Humanity is currently facing its greatest refugee crisis since World War II, with more than 60 million people forced from their homes. Much has been written about their use of technology — how Google Maps, WhatsApp, Facebook, and other tools have proven invaluable to the displaced and desperate. But helping refugees find their way, connect with family, or read the latest updates about route closings is one thing. Enabling them to grasp minute legal details, find worthwhile jobs and housing, enroll their children in school, and register for visas and benefits when they don’t understand the local tongue is another.

Due to its interpretation of the 1951 Geneva Convention on refugees, Ankara does not categorize Syrians in Turkey as refugees, nor does it accord them the pursuant rights and advantages. Instead, it has given them the unusual legal status of temporary guests, which means that they cannot apply for asylum and that Turkey can send them back to their countries of origin whenever it likes. What’s more, the laws and processes that apply to Syrians have been less than transparent and have changed several times. Despite all this — or perhaps because of it — government outreach has been minimal. Turkey has spent some $10 billion on refugees, and it distributes Arabic-language brochures at refugee camps and in areas with many Syrian residents. Yet it has created no Arabic-language website, app, or other online tool to communicate the relevant laws, permits, and legal changes to Syrians and other refugees.

Independent apps targeting these hurdles have begun to proliferate. Gherbtna’s main competitor in Turkey is the recently launched Alfanus (“Lantern” in Arabic), which its Syrian creators call an “Arab’s Guide to Turkey.” Last year, Souktel, a Palestinian mobile solutions firm, partnered with the international arm of the American Bar Association to launch a text-message service that provides legal information to Arabic speakers in Turkey. Norway is running a competition to develop a game-based learning app to educate Syrian refugee children. German programmers created Germany Says Welcome and the similar Welcome App Dresden. And Akil’s tech firm, Namaa Solutions, recently launched Tarjemly Live, a live translation app for English, Arabic, and Turkish.

But the extent to which these technologies have succeeded — have actually helped Syrians adjust and build new lives in Turkey, in particular — is in doubt. Take Gherbtna. The app has nine tools, including Video, Laws, Alerts, Find a Job, and “Ask me.” It offers restaurant and job listings; advice on getting a residence permit, opening a bank account, or launching a business; and much more. Like Souktel, Gherbtna has partnered with the American Bar Association to provide translations of Turkish laws. The app has been downloaded about 50,000 times, or by about 5 percent of Syrians in Turkey. (It is safe to assume, however, that a sizable percentage of refugees do not have smartphones.) Yet among two dozen Gherbtna users recently interviewed in Gaziantep and Istanbul — two Turkish cities with the most dense concentration of Syrians — most found it lacking. Many appreciate Gherbtna’s one-stop-shop appeal, but find little cause to keep using it. ”…(More)”

Virtual memory: the race to save the information age


Review by Richard Ovenden in the Financial Times of:
You Could Look It Up: The Reference Shelf from Ancient Babylon to Wikipedia, by Jack Lynch, Bloomsbury, RRP£25/$30, 464 pages

When We Are No More: How Digital Memory Is Shaping Our Future, by Abbey Smith Rumsey, Bloomsbury, RRP£18.99/$28, 240 pages

Ctrl + Z: The Right to Be Forgotten, by Meg Leta Jones, NYU Press, RRP£20.99/$29.95, 284 pages

“…For millions of people, technological devices have become essential tools in keeping memories alive — to the point where it can feel as though events without an impression in silicon have somehow not been fully experienced. In under three decades, the web has expanded to contain more than a billion sites. Every day about 300m digital photographs, more than 100 terabytes’ worth, are uploaded to Facebook. An estimated 204m emails are sent every minute and, with 5bn mobile devices in existence, the generation of new content looks set to continue its rapid growth.

Is the abundance of information in the age of Google and Facebook storing up problems for future generations? Richard Ovenden, who as Bodley’s Librarian is responsible for the research libraries of the University of Oxford, talks about the opportunites and concerns of the digitisation of memory with John Thornhill, the FT’s innovation editor

We celebrate this growth, and rightly. Today knowledge is created and consumed at a rate that would have been inconceivable a generation ago; instant access to the fruits of millennia of civilisation now seems like a natural state of affairs. Yet we overlook — at our peril — just how unstable and transient much of this information is. Amid the proliferation there is also constant decay: phenomena such as “bit rot” (the degradation of software programs over time), “data rot” (the deterioration of digital storage media) and “link rot” (web links pointing to online resources that have become permanently unavailable) can render information inaccessible. This affects everything from holiday photos and email correspondence to official records: to give just one example, a Harvard study published in 2013 found that 50 per cent of links in the US Supreme Court opinions website were broken.

Are we creating a problem that future generations will not be able to solve? Could the early decades of the 21st century even come to seem, in the words of the internet pioneer Vint Cerf, like a“digital Dark Age”? Whether or not such fears are realised, it is becoming increasingly clear that the migration of knowledge to formats permitting rapid and low-cost copying and dissemination, but in which the base information cannot survive without complex and expensive intervention, requires that we choose, more actively than ever before, what to remember and what to forget….(More)”

A New Dark Age Looms


William B. Gail in the New York Times: “Imagine a future in which humanity’s accumulated wisdom about Earth — our vast experience with weather trends, fish spawning and migration patterns, plant pollination and much more — turns increasingly obsolete. As each decade passes, knowledge of Earth’s past becomes progressively less effective as a guide to the future. Civilization enters a dark age in its practical understanding of our planet.

To comprehend how this could occur, picture yourself in our grandchildren’s time, a century hence. Significant global warming has occurred, as scientists predicted. Nature’s longstanding, repeatable patterns — relied on for millenniums by humanity to plan everything from infrastructure to agriculture — are no longer so reliable. Cycles that have been largely unwavering during modern human history are disrupted by substantial changes in temperature and precipitation….

Our foundation of Earth knowledge, largely derived from historically observed patterns, has been central to society’s progress. Early cultures kept track of nature’s ebb and flow, passing improved knowledge about hunting and agriculture to each new generation. Science has accelerated this learning process through advanced observation methods and pattern discovery techniques. These allow us to anticipate the future with a consistency unimaginable to our ancestors.

But as Earth warms, our historical understanding will turn obsolete faster than we can replace it with new knowledge. Some patterns will change significantly; others will be largely unaffected, though it will be difficult to say what will change, by how much, and when.

The list of possible disruptions is long and alarming. We could see changes to the prevalence of crop and human pests, like locust plagues set off by drought conditions; forest fire frequency; the dynamics of the predator-prey food chain; the identification and productivity of reliably arable land, and the predictability of agriculture output.

Historians of the next century will grasp the importance of this decline in our ability to predict the future. They may mark the coming decades of this century as the period during which humanity, despite rapid technological and scientific advances, achieved “peak knowledge” about the planet it occupies. They will note that many decades may pass before society again attains the same level.

One exception to this pattern-based knowledge is the weather, whose underlying physics governs how the atmosphere moves and adjusts. Because we understand the physics, we can replicate the atmosphere with computer models. Monitoring by weather stations and satellites provides the starting point for the models, which compute a forecast for how the weather will evolve. Today, forecast accuracy based on such models is generally good out to a week, sometimes even two.

But farmers need to think a season or more ahead. So do infrastructure planners as they design new energy and water systems. It may be feasible to develop the science and make the observations necessary to forecast weather a month or even a season in advance. We are also coming to understand enough of the physics to make useful global and regional climate projections a decade or more ahead.

The intermediate time period is our big challenge. Without substantial scientific breakthroughs, we will remain reliant on pattern-based methods for time periods between a month and a decade. … Our best knowledge is built on what we have seen in the past, like how fish populations respond to El Niño’s cycle. Climate change will further undermine our already limited ability to make these predictions. Anticipating ocean resources from one year to the next will become harder.

Civilization’s understanding of Earth has expanded enormously in recent decades, making humanity safer and more prosperous. As the patterns that we have come to expect are disrupted by warming temperatures, we will face huge challenges feeding a growing population and prospering within our planet’s finite resources. New developments in science offer our best hope for keeping up, but this is by no means guaranteed….(More)”

Social app for refugees and locals translates in real-time


Springwise: “Europe is in the middle of a major refugee crisis, with more than one million migrants arriving in 2015 alone. Now, developers in Stockholm are coming up with new ways for arrivals to integrate into their new homes.

Welcome! is an app based in Sweden, a country that has operated a broadly open policy to immigration in recent years. The developers say the app aims to break down social and language barriers between Swedes and refugees. Welcome! is translated into Arabic, Persian, Swedish and English, and it enables users to create, host and join activities, as well as ask questions of locals, chat with new contacts, and browse events that are nearby.

The idea is to solve one of the major difficulties for immigrants arriving in Europe by encouraging the new arrivals and locals to interact and connect, helping the refugees to settle in. The app offers real-time auto-translation through its four languages, and can be downloaded for iOS and Android….We have already seen an initiative in Finland helping to set up startups with refugees…(More)

HeroX enables breakthroughs


About HeroX, “The World’s Problem Solver Community”: “On October 21, 2004, Scaled Composites’ SpaceShipOne reached the edge of space, an altitude of 100km, becoming the first privately built spacecraft to perform this feat, twice within two weeks.

In so doing, they won the $10 million Ansari XPRIZE, ushering in a new era of commercial space exploration and applications.

It was the inaugural incentive prize competition of the XPRIZE Foundation, which has gone on to create an incredible array of incentive prizes to solve the world’s Grand Challenges — ocean health, literacy, space exploration, among many others.

In 2011, City Light Capital partnered with XPRIZE to envision a platform that would make the power of incentive challenges available to anyone. The result was the spin-off of HeroX in 2013.

HeroX was co-founded in 2013 by XPRIZE founder Peter Diamandis, challenge designer Emily Fowler and entrepreneur Christian Cotichini as a means to democratize the innovation model of XPRIZE.

HeroX exists to enable anyone, anywhere in the world, to create a challenge that addresses any problem or opportunity, build a community around that challenge and activate the circumstances that can lead to a breakthrough innovation.

This innovation model has existed for centuries.

The Ansari XPRIZE was inspired by the 1927 Orteig Prize, in which Charles Lindbergh crossed the Atlantic in the Spirit of St. Louis. The $25,000 prize had been offered by hotelier Raymond Orteig to spur tourism. Lindbergh’s flight lead to a boom in air travel the world over.

A similar challenge had launched 200 years earlier with the 1716 Longitude Prize, which sought a technology to more accurately measure longitude at sea. Nearly 60 years later, a British clockmaker named John Harrison invented the chronometer, which spurred Trans-Atlantic migration on a massive scale.

In 1795, Napoleon offered a 12,000 franc prize for a better method of preserving food, which was often spoiled by the time it reached the front lines of his armies. The breakthrough innovation to Napoleon’s prize led to the creation of the canning industry.

HeroX incentive prize challenges are designed to do the same — to harness the collective mind power of a community to innovate upon any problem or opportunity. Anyone can change the world. HeroX can help.

The only question is, “What do you want to solve?”… (More)

Ebola: A Big Data Disaster


Study by Sean Martin McDonald: “…undertaken with support from the Open Society Foundation, Ford Foundation, and Media Democracy Fund, explores the use of Big Data in the form of Call Detail Record (CDR) data in humanitarian crisis.

It discusses the challenges of digital humanitarian coordination in health emergencies like the Ebola outbreak in West Africa, and the marked tension in the debate around experimentation with humanitarian technologies and the impact on privacy. McDonald’s research focuses on the two primary legal and human rights frameworks, privacy and property, to question the impact of unregulated use of CDR’s on human rights. It also highlights how the diffusion of data science to the realm of international development constitutes a genuine opportunity to bring powerful new tools to fight crisis and emergencies.

Analysing the risks of using CDRs to perform migration analysis and contact tracing without user consent, as well as the application of big data to disease surveillance is an important entry point into the debate around use of Big Data for development and humanitarian aid. The paper also raises crucial questions of legal significance about the access to information, the limitation of data sharing, and the concept of proportionality in privacy invasion in the public good. These issues hold great relevance in today’s time where big data and its emerging role for development, involving its actual and potential uses as well as harms is under consideration across the world.

The paper highlights the absence of a dialogue around the significant legal risks posed by the collection, use, and international transfer of personally identifiable data and humanitarian information, and the grey areas around assumptions of public good. The paper calls for a critical discussion around the experimental nature of data modelling in emergency response due to mismanagement of information has been largely emphasized to protect the contours of human rights….

See Sean Martin McDonald – “Ebola: A Big Data Disaster” (PDF).

 

From Freebase to Wikidata: The Great Migration


Paper by Thomas Pellissier Tanon et al: “Collaborative knowledge bases that make their data freely available in a machine-readable form are central for the data strategy of many projects and organizations. The two major collaborative knowledge bases are Wikimedia’s Wikidata and Google’s Freebase. Due to the success of Wikidata, Google decided in 2014 to offer the content of Freebase to the Wikidata community. In this paper, we report on the ongoing transfer efforts and data mapping challenges, and provide an analysis of the effort so far. We describe the Primary Sources Tool, which aims to facilitate this and future data migrations. Throughout the migration, we have gained deep insights into both Wikidata and Freebase, and share and discuss detailed statistics on both knowledge bases….(More)”

Big data’s big role in humanitarian aid


Mary K. Pratt at Computerworld: “Hundreds of thousands of refugees streamed into Europe in 2015 from Syria and other Middle Eastern countries. Some estimates put the number at nearly a million.

The sheer volume of people overwhelmed European officials, who not only had to handle the volatile politics stemming from the crisis, but also had to find food, shelter and other necessities for the migrants.

Sweden, like many of its European Union counterparts, was taking in refugees. The Swedish Migration Board, which usually sees 2,500 asylum seekers in an average month, was accepting 10,000 per week.

“As you can imagine, with that number, it requires a lot of buses, food, registration capabilities to start processing all the cases and to accommodate all of those people,” says Andres Delgado, head of operational control, coordination and analysis at the Swedish Migration Board.

Despite the dramatic spike in refugees coming into the country, the migration agency managed the intake — hiring extra staff, starting the process of procuring housing early, getting supplies ready. Delgado credits a good part of that success to his agency’s use of big data and analytics that let him predict, with a high degree of accuracy, what was heading his way.

“Without having that capability, or looking at the tool every day, to assess every need, this would have crushed us. We wouldn’t have survived this,” Delgado says. “It would have been chaos, actually — nothing short of that.”

The Swedish Migration Board has been using big data and analytics for several years, as it seeks to gain visibility into immigration trends and what those trends will mean for the country…./…

“Can big data give us peace? I think the short answer is we’re starting to explore that. We’re at the very early stages, where there are shining examples of little things here and there. But we’re on that road,” says Kalev H. Leetaru, creator of the GDELT Project, or the Global Database of Events, Language and Tone, which describes itself as a comprehensive “database of human society.”

The topic is gaining traction. A 2013 report, “New Technology and the Prevention of Violence and Conflict,” from the International Peace Institute, highlights uses of telecommunications technology, including data, in several crisis situations around the world. The report emphasizes the potential these technologies hold in helping to ease tensions and address problems.

The report’s conclusion offers this idea: “Big data can be used to identify patterns and signatures associated with conflict — and those associated with peace — presenting huge opportunities for better-informed efforts to prevent violence and conflict.”

That’s welcome news to Noel Dickover. He’s the director of PeaceTech Data Networks at the PeaceTech Lab, which was created by the U.S. Institute of Peace (USIP) to advance USIP’s work on how technology, media and data help reduce violent conflict around the world.

Such work is still in the nascent stages, Dickover says, but people are excited about its potential. “We have unprecedented amounts of data on human sentiment, and we know there’s value there,” he says. “The question is how to connect it.”

Dickover is working on ways to do just that. One example is the Open Situation Room Exchange (OSRx) project, which aims to “empower greater collective impact in preventing or mitigating serious violent conflicts in particular arenas through collaboration and data-sharing.”…(More)

How Big Data is Helping to Tackle Climate Change


Bernard Marr at DataInformed: “Climate scientists have been gathering a great deal of data for a long time, but analytics technology’s catching up is comparatively recent. Now that cloud, distributed storage, and massive amounts of processing power are affordable for almost everyone, those data sets are being put to use. On top of that, the growing number of Internet of Things devices we are carrying around are adding to the amount of data we are collecting. And the rise of social media means more and more people are reporting environmental data and uploading photos and videos of their environment, which also can be analyzed for clues.

Perhaps one of the most ambitious projects that employ big data to study the environment is Microsoft’s Madingley, which is being developed with the intention of creating a simulation of all life on Earth. The project already provides a working simulation of the global carbon cycle, and it is hoped that, eventually, everything from deforestation to animal migration, pollution, and overfishing will be modeled in a real-time “virtual biosphere.” Just a few years ago, the idea of a simulation of the entire planet’s ecosphere would have seemed like ridiculous, pie-in-the-sky thinking. But today it’s something into which one of the world’s biggest companies is pouring serious money. Microsoft is doing this because it believes that analytical technology has finally caught up with the ability to collect and store data.

Another data giant that is developing tools to facilitate analysis of climate and ecological data is EMC. Working with scientists at Acadia National Park in Maine, the company has developed platforms to pull in crowd-sourced data from citizen science portals such as eBird and iNaturalist. This allows park administrators to monitor the impact of climate change on wildlife populations as well as to plan and implement conservation strategies.

Last year, the United Nations, under its Global Pulse data analytics initiative, launched the Big Data Climate Challenge, a competition aimed to promote innovate data-driven climate change projects. Among the first to receive recognition under the program is Global Forest Watch, which combines satellite imagery, crowd-sourced witness accounts, and public datasets to track deforestation around the world, which is believed to be a leading man-made cause of climate change. The project has been promoted as a way for ethical businesses to ensure that their supply chain is not complicit in deforestation.

Other initiatives are targeted at a more personal level, for example by analyzing transit routes that could be used for individual journeys, using Google Maps, and making recommendations based on carbon emissions for each route.

The idea of “smart cities” is central to the concept of the Internet of Things – the idea that everyday objects and tools are becoming increasingly connected, interactive, and intelligent, and capable of communicating with each other independently of humans. Many of the ideas put forward by smart-city pioneers are grounded in climate awareness, such as reducing carbon dioxide emissions and energy waste across urban areas. Smart metering allows utility companies to increase or restrict the flow of electricity, gas, or water to reduce waste and ensure adequate supply at peak periods. Public transport can be efficiently planned to avoid wasted journeys and provide a reliable service that will encourage citizens to leave their cars at home.

These examples raise an important point: It’s apparent that data – big or small – can tell us if, how, and why climate change is happening. But, of course, this is only really valuable to us if it also can tell us what we can do about it. Some projects, such as Weathersafe, which helps coffee growers adapt to changing weather patterns and soil conditions, are designed to help humans deal with climate change. Others are designed to tackle the problem at the root, by highlighting the factors that cause it in the first place and showing us how we can change our behavior to minimize damage….(More)”