The Potential of Social Media Intelligence to Improve People’s Lives: Social Media Data for Good


New report by Stefaan G. Verhulst and Andrew Young: “The twenty-first century will be challenging on many fronts. From historically catastrophic natural disasters resulting from climate change to inequality to refugee and terrorism crises, it is clear that we need not only new solutions, but new insights and methods of arriving at solutions. Data, and the intelligence gained from it through advances in data science, is increasingly being seen as part of the answer. This report explores the premise that data—and in particular the vast stores of data and the unique analytical expertise held by social media companies—may indeed provide for a new type of intelligence that could help develop solutions to today’s challenges.

Social Media Data Report

In this report, developed with support from Facebook, we focus on an approach to extract public value from social media data that we believe holds the greatest potential: data collaboratives. Data collaboratives are an emerging form of public-private partnership in which actors from different sectors exchange information to create new public value. Such collaborative arrangements, for example between social media companies and humanitarian organizations or civil society actors, can be seen as possible templates for leveraging privately held data towards the attainment of public goals….(More)”

Mastercard’s Big Data For Good Initiative: Data Philanthropy On The Front Lines


Interview by Randy Bean of Shamina Singh: Much has been written about big data initiatives and the efforts of market leaders to derive critical business insights faster. Less has been written about initiatives by some of these same firms to apply big data and analytics to a different set of issues, which are not solely focused on revenue growth or bottom line profitability. While the focus of most writing has been on the use of data for competitive advantage, a small set of companies has been undertaking, with much less fanfare, a range of initiatives designed to ensure that data can be applied not just for corporate good, but also for social good.

One such firm is Mastercard, which describes itself as a technology company in the payments industry, which connects buyers and sellers in 210 countries and territories across the globe. In 2013 Mastercard launched the Mastercard Center for Inclusive Growth, which operates as an independent subsidiary of Mastercard and is focused on the application of data to a range of issues for social benefit….

In testimony before the Senate Committee on Foreign Affairs on May 4, 2017, Mastercard Vice Chairman Walt Macnee, who serves as the Chairman of the Center for Inclusive Growth, addressed issues of private sector engagement. Macnee noted, “The private sector and public sector can each serve as a force for good independently; however when the public and private sectors work together, they unlock the potential to achieve even more.” Macnee further commented, “We will continue to leverage our technology, data, and know-how in an effort to solve many of the world’s most pressing problems. It is the right thing to do, and it is also good for business.”…

Central to the mission of the Mastercard Center is the notion of “data philanthropy”. This term encompasses notions of data collaboration and data sharing and is at the heart of the initiatives that the Center is undertaking. The three cornerstones on the Center’s mandate are:

  • Sharing Data Insights– This is achieved through the concept of “data grants”, which entails granting access to proprietary insights in support of social initiatives in a way that fully protects consumer privacy.
  • Data Knowledge – The Mastercard Center undertakes collaborations with not-for-profit and governmental organizations on a range of initiatives. One such effort was in collaboration with the Obama White House’s Data-Driven Justice Initiative, by which data was used to help advance criminal justice reform. This initiative was then able, through the use of insights provided by Mastercard, to demonstrate the impact crime has on merchant locations and local job opportunities in Baltimore.
  • Leveraging Expertise – Similarly, the Mastercard Center has collaborated with private organizations such as DataKind, which undertakes data science initiatives for social good.Just this past month, the Mastercard Center released initial findings from its Data Exploration: Neighborhood Crime and Local Business initiative. This effort was focused on ways in which Mastercard’s proprietary insights could be combined with public data on commercial robberies to help understand the potential relationships between criminal activity and business closings. A preliminary analysis showed a spike in commercial robberies followed by an increase in bar and nightclub closings. These analyses help community and business leaders understand factors that can impact business success.Late last year, Ms. Singh issued A Call to Action on Data Philanthropy, in which she challenges her industry peers to look at ways in which they can make a difference — “I urge colleagues at other companies to review their data assets to see how they may be leveraged for the benefit of society.” She concludes, “the sheer abundance of data available today offers an unprecedented opportunity to transform the world for good.”….(More)

Children and the Data Cycle: Rights And Ethics in a Big Data World


Gabrielle Berman andKerry Albright at UNICEF: “In an era of increasing dependence on data science and big data, the voices of one set of major stakeholders – the world’s children and those who advocate on their behalf – have been largely absent. A recent paper estimates one in three global internet users is a child, yet there has been little rigorous debate or understanding of how to adapt traditional, offline ethical standards for research involving data collection from children, to a big data, online environment (Livingstone et al., 2015). This paper argues that due to the potential for severe, long-lasting and differential impacts on children, child rights need to be firmly integrated onto the agendas of global debates about ethics and data science. The authors outline their rationale for a greater focus on child rights and ethics in data science and suggest steps to move forward, focusing on the various actors within the data chain including data generators, collectors, analysts and end-users. It concludes by calling for a much stronger appreciation of the links between child rights, ethics and data science disciplines and for enhanced discourse between stakeholders in the data chain, and those responsible for upholding the rights of children, globally….(More)”.

Using Collaboration to Harness Big Data for Social Good


Jake Porway at SSIR: “These days, it’s hard to get away from the hype around “big data.” We read articles about how Silicon Valley is using data to drive everything from website traffic to autonomous cars. We hear speakers at social sector conferences talk about how nonprofits can maximize their impact by leveraging new sources of digital information like social media data, open data, and satellite imagery.

Braving this world can be challenging, we know. Creating a data-driven organization can require big changes in culture and process. Some nonprofits, like Crisis Text Line and Watsi, started off boldly by building their own data science teams. But for the many other organizations wondering how to best use data to advance their mission, we’ve found that one ingredient works better than all the software and tech that you can throw at a problem: collaboration.

As a nonprofit dedicated to applying data science for social good, DataKind has run more than 200 projects in collaboration with other nonprofits worldwide by connecting them to teams of volunteer data scientists. What do the most successful ones have in common? Strong collaborations on three levels: with data science experts, within the organization itself, and across the nonprofit sector as a whole.

1. Collaborate with data science experts to define your project. As we often say, finding problems can be harder than finding solutions. ….

2. Collaborate across your organization to “build with, not for.” Our projects follow the principles of human-centered design and the philosophy pioneered in the civic tech world of “design with, not for.” ….

3. Collaborate across your sector to move the needle. Many organizations think about building data science solutions for unique challenges they face, such as predicting the best location for their next field office. However, most of us are fighting common causes shared by many other groups….

By focusing on building strong collaborations on these three levels—with data experts, across your organization, and across your sector—you’ll go from merely talking about big data to making big impact….(More).

Big Data, Data Science, and Civil Rights


Paper by Solon Barocas, Elizabeth Bradley, Vasant Honavar, and Foster Provost:  “Advances in data analytics bring with them civil rights implications. Data-driven and algorithmic decision making increasingly determine how businesses target advertisements to consumers, how police departments monitor individuals or groups, how banks decide who gets a loan and who does not, how employers hire, how colleges and universities make admissions and financial aid decisions, and much more. As data-driven decisions increasingly affect every corner of our lives, there is an urgent need to ensure they do not become instruments of discrimination, barriers to equality, threats to social justice, and sources of unfairness. In this paper, we argue for a concrete research agenda aimed at addressing these concerns, comprising five areas of emphasis: (i) Determining if models and modeling procedures exhibit objectionable bias; (ii) Building awareness of fairness into machine learning methods; (iii) Improving the transparency and control of data- and model-driven decision making; (iv) Looking beyond the algorithm(s) for sources of bias and unfairness—in the myriad human decisions made during the problem formulation and modeling process; and (v) Supporting the cross-disciplinary scholarship necessary to do all of that well…(More)”.

Our path to better science in less time using open data science tools


Julia S. Stewart Lowndes et al in Nature: “Reproducibility has long been a tenet of science but has been challenging to achieve—we learned this the hard way when our old approaches proved inadequate to efficiently reproduce our own work. Here we describe how several free software tools have fundamentally upgraded our approach to collaborative research, making our entire workflow more transparent and streamlined. By describing specific tools and how we incrementally began using them for the Ocean Health Index project, we hope to encourage others in the scientific community to do the same—so we can all produce better science in less time.

Figure 1: Better science in less time, illustrated by the Ocean Health Index project.
Figure 1

Every year since 2012 we have repeated Ocean Health Index (OHI) methods to track change in global ocean health36,37. Increased reproducibility and collaboration has reduced the amount of time required to repeat methods (size of bubbles) with updated data annually, allowing us to focus on improving methods each year (text labels show the biggest innovations). The original assessment in 2012 focused solely on scientific methods (for example, obtaining and analysing data, developing models, calculating, and presenting results; dark shading). In 2013, by necessity we gave more focus to data science (for example, data organization and wrangling, coding, versioning, and documentation; light shading), using open data science tools. We established R as the main language for all data preparation and modelling (using RStudio), which drastically decreased the time involved to complete the assessment. In 2014, we adopted Git and GitHub for version control, project management, and collaboration. This further decreased the time required to repeat the assessment. We also created the OHI Toolbox, which includes our R package ohicore for core analytical operations used in all OHI assessments. In subsequent years we have continued (and plan to continue) this trajectory towards better science in less time by improving code with principles of tidy data33; standardizing file and data structure; and focusing more on communication, in part by creating websites with the same open data science tools and workflow. See text and Table 1 for more details….(More)”

Big Data Science: Opportunities and Challenges to Address Minority Health and Health Disparities in the 21st Century


Xinzhi Zhang et al in Ethnicity and Disease: “Addressing minority health and health disparities has been a missing piece of the puzzle in Big Data science. This article focuses on three priority opportunities that Big Data science may offer to the reduction of health and health care disparities. One opportunity is to incorporate standardized information on demographic and social determinants in electronic health records in order to target ways to improve quality of care for the most disadvantaged popula­tions over time. A second opportunity is to enhance public health surveillance by linking geographical variables and social determinants of health for geographically defined populations to clinical data and health outcomes. Third and most impor­tantly, Big Data science may lead to a better understanding of the etiology of health disparities and understanding of minority health in order to guide intervention devel­opment. However, the promise of Big Data needs to be considered in light of significant challenges that threaten to widen health dis­parities. Care must be taken to incorporate diverse populations to realize the potential benefits. Specific recommendations include investing in data collection on small sample populations, building a diverse workforce pipeline for data science, actively seeking to reduce digital divides, developing novel ways to assure digital data privacy for small populations, and promoting widespread data sharing to benefit under-resourced minority-serving institutions and minority researchers. With deliberate efforts, Big Data presents a dramatic opportunity for re­ducing health disparities but without active engagement, it risks further widening them….(More)”

Creating Safer Streets Through Data Science


Datakind: “Tens of thousands of people are killed or injured in traffic collisions each year. To improve road safety and combat life-threatening crashes, over 25 U.S. cities have adopted Vision Zero, an initiative born in Sweden in the 1990’s that aims to reduce traffic-related deaths and serious injuries to zero. Vision Zero is built upon the belief that crashes are predictable and preventable, though determining what kind of engineering, enforcement and educational interventions are effective can be difficult and costly for cities with limited resources.

While many cities have access to data about where and why serious crashes occur to help pinpoint streets and intersections that are trouble spots, the use of predictive algorithms and advanced statistical methods to determine the effectiveness of different safety initiatives is less widespread. Seeing the potential for data and technology to advance the Vision Zero movement in the U.S., DataKind and Microsoft wondered: How might we support cities to apply data science to reduce traffic fatalities and injuries to zero?
What Happened?Three U.S. cities – New York, Seattle and New Orleans – partnered with DataKind, in the first and largest multi-city, data-driven collaboration of its kind, to support Vision Zero efforts within the U.S. Each city had specific questions that they wished to address related to better understanding the factors contributing to crashes and what types of engineering treatments or enforcement interventions may be most effective in helping each of their local efforts and increase traffic safety for all.

To help the cities answer these questions, DataKind launched its first ever Labs project, led by DataKind data scientists Erin Akred, Michael Dowd, Jackie Weiser and Sina Kashuk. A DataDive was held in Seattle to help support the project. Dozens of volunteers participated in the event and helped fuel the work that was achieved, including volunteers from Microsoft and the University of Washington’s E-Science Institute, as well as many other Seattle data scientists.

The DataKind team also worked closely with local city officials and transportation experts to gain valuable insight and feedback on the project, and access a wide variety of datasets, such as information on past crashes, roadway attributes (e.g. lanes, traffic signals, and sidewalks), land use, demographic data, commuting patterns, parking violations, and existing safety intervention placements.

The cities provided information about their priority issues, expertise on their local environments, access to their data, and feedback on the models and analytic insights.  Microsoft enabled the overall collaboration by providing resources, including expertise in support of the collaborative model, technical approaches, and project goals.

Below are detailed descriptions of the specific local traffic safety questions each city asked, the data science approach and outputs the DataKind team developed, and the outcomes and impacts these analyses are providing each city….(More)”

Estimating suicide occurrence statistics using Google Trends


Ladislav Kristoufek, Helen Susannah Moat and Tobias Preis in EPJ Data Science: “Data on the number of people who have committed suicide tends to be reported with a substantial time lag of around two years. We examine whether online activity measured by Google searches can help us improve estimates of the number of suicide occurrences in England before official figures are released. Specifically, we analyse how data on the number of Google searches for the terms ‘depression’ and ‘suicide’ relate to the number of suicides between 2004 and 2013. We find that estimates drawing on Google data are significantly better than estimates using previous suicide data alone. We show that a greater number of searches for the term ‘depression’ is related to fewer suicides, whereas a greater number of searches for the term ‘suicide’ is related to more suicides. Data on suicide related search behaviour can be used to improve current estimates of the number of suicide occurrences….(More)”

 

From big data to smart data: FDA’s INFORMED initiative


Sean KhozinGeoffrey Kim & Richard Pazdur in Nature: “….Recent advances in our understanding of disease mechanisms have led to the development of new drugs that are enabling precision medicine. For example, the co-development of kinase inhibitors that target ‘driver mutations’ in metastatic non-small-cell lung cancer (NSCLC) with companion diagnostics has led to substantial improvements in the treatment of some patients. However, growing evidence suggests that most patients with metastatic NSCLC and other advanced cancers may not have tumours with single driver mutations. Furthermore, the generation of clinical evidence in genomically diverse and geographically dispersed groups of patients using traditional trial designs and multiple competing therapies is becoming more costly and challenging.

Strategies aimed at creating new efficiencies in clinical evidence generation and extending the benefits of precision medicine to larger groups of patients are driving a transformation from a reductionist approach to drug development (for example, a single drug targeting a driver mutation and traditional clinical trials) to a holistic approach (for example, combination therapies targeting complex multiomic signatures and real-world evidence). This transition is largely fuelled by the rapid expansion in the four dimensions of biomedical big data, which has created a need for greater organizational and technical capabilities (Fig. 1). Appropriate management and analysis of such data requires specialized tools and expertise in health information technology, data science and high-performance computing. For example, efforts to generate clinical evidence using real-world data are being limited by challenges such as capturing clinically relevant variables from vast volumes of unstructured content (such as physician notes) in electronic health records and organizing various structured data elements that are primarily designed to support billing rather than clinical research. So, new standards and quality-control mechanisms are needed to ensure the validity of the design and analysis of studies based on electronic health records.

Figure 1: Conceptual map of technical and organizational capacity for biomedical big data.
Conceptual map of technical and organizational capacity for biomedical big data.

Big data can be defined as having four dimensions: volume (data size), variety (data type), veracity (data noise and uncertainty) and velocity (data flow and processing). Currently, FDA approval decisions are generally based on data of limited variety, mainly from clinical trials and preclinical studies (1) that are mostly structured (2), in data sets usually no more than a few gigabytes in size (3), that are processed intermittently as part of regulatory submissions (4). The expansion of big data in the four dimensions (grey lines) calls for increasing organizational and technical capacity. This could transform big data into smart data by enabling a holistic approach to personalization of therapies that takes patient, disease and environmental characteristics into account. (Full size image (309 KB);Download PowerPoint slide (492 KB)More)”