The Ethics of Big Data Applications in the Consumer Sector


Paper by Markus Christen et al : “Business applications relying on processing of large amounts of heterogeneous data (Big Data) are considered to be key drivers of innovation in the digital economy. However, these applications also pose ethical issues that may undermine the credibility of data-driven businesses. In our contribution, we discuss ethical problems that are associated with Big Data such as: How are core values like autonomy, privacy, and solidarity affected in a Big Data world? Are some data a public good? Or: Are we obliged to divulge personal data to a certain degree in order to make the society more secure or more efficient?

We answer those questions by first outlining the ethical topics that are discussed in the scientific literature and the lay media using a bibliometric approach. Second, referring to the results of expert interviews and workshops with practitioners, we identify core norms and values affected by Big Data applications—autonomy, equality, fairness, freedom, privacy, property-rights, solidarity, and transparency—and outline how they are exemplified in examples of Big Data consumer applications, for example, in terms of informational self-determination, non-discrimination, or free opinion formation. Based on use cases such as personalized advertising, individual pricing, or credit risk management we discuss the process of balancing such values in order to identify legitimate, questionable, and unacceptable Big Data applications from an ethics point of view. We close with recommendations on how practitioners working in applied data science can deal with ethical issues of Big Data….(More)”.

Data & Policy: A new venue to study and explore policy–data interaction


Opening editorial by Stefaan G. Verhulst, Zeynep Engin and Jon Crowcroft: “…Policy–data interactions or governance initiatives that use data have been the exception rather than the norm, isolated prototypes and trials rather than an indication of real, systemic change. There are various reasons for the generally slow uptake of data in policymaking, and several factors will have to change if the situation is to improve. ….

  • Despite the number of successful prototypes and small-scale initiatives, policy makers’ understanding of data’s potential and its value proposition generally remains limited (Lutes, 2015). There is also limited appreciation of the advances data science has made the last few years. This is a major limiting factor; we cannot expect policy makers to use data if they do not recognize what data and data science can do.
  • The recent (and justifiable) backlash against how certain private companies handle consumer data has had something of a reverse halo effect: There is a growing lack of trust in the way data is collected, analyzed, and used, and this often leads to a certain reluctance (or simply risk-aversion) on the part of officials and others (Engin, 2018).
  • Despite several high-profile open data projects around the world, much (probably the majority) of data that could be helpful in governance remains either privately held or otherwise hidden in silos (Verhulst and Young, 2017b). There remains a shortage not only of data but, more specifically, of high-quality and relevant data.
  • With few exceptions, the technical capacities of officials remain limited, and this has obviously negative ramifications for the potential use of data in governance (Giest, 2017).
  • It’s not just a question of limited technical capacities. There is often a vast conceptual and values gap between the policy and technical communities (Thompson et al., 2015; Uzochukwu et al., 2016); sometimes it seems as if they speak different languages. Compounding this difference in world views is the fact that the two communities rarely interact.
  • Yet, data about the use and evidence of the impact of data remain sparse. The impetus to use more data in policy making is stymied by limited scholarship and a weak evidential basis to show that data can be helpful and how. Without such evidence, data advocates are limited in their ability to make the case for more data initiatives in governance.
  • Data are not only changing the way policy is developed, but they have also reopened the debate around theory- versus data-driven methods in generating scientific knowledge (Lee, 1973; Kitchin, 2014; Chivers, 2018; Dreyfuss, 2017) and thus directly questioning the evidence base to utilization and implementation of data within policy making. A number of associated challenges are being discussed, such as: (i) traceability and reproducibility of research outcomes (due to “black box processing”); (ii) the use of correlation instead of causation as the basis of analysis, biases and uncertainties present in large historical datasets that cause replication and, in some cases, amplification of human cognitive biases and imperfections; and (iii) the incorporation of existing human knowledge and domain expertise into the scientific knowledge generation processes—among many other topics (Castelvecchi, 2016; Miller and Goodchild, 2015; Obermeyer and Emanuel, 2016; Provost and Fawcett, 2013).
  • Finally, we believe that there should be a sound under-pinning a new theory of what we call Policy–Data Interactions. To date, in reaction to the proliferation of data in the commercial world, theories of data management,1 privacy,2 and fairness3 have emerged. From the Human–Computer Interaction world, a manifesto of principles of Human–Data Interaction (Mortier et al., 2014) has found traction, which intends reducing the asymmetry of power present in current design considerations of systems of data about people. However, we need a consistent, symmetric approach to consideration of systems of policy and data, how they interact with one another.

All these challenges are real, and they are sticky. We are under no illusions that they will be overcome easily or quickly….

During the past four conferences, we have hosted an incredibly diverse range of dialogues and examinations by key global thought leaders, opinion leaders, practitioners, and the scientific community (Data for Policy, 2015201620172019). What became increasingly obvious was the need for a dedicated venue to deepen and sustain the conversations and deliberations beyond the limitations of an annual conference. This leads us to today and the launch of Data & Policy, which aims to confront and mitigate the barriers to greater use of data in policy making and governance.

Data & Policy is a venue for peer-reviewed research and discussion about the potential for and impact of data science on policy. Our aim is to provide a nuanced and multistranded assessment of the potential and challenges involved in using data for policy and to bridge the “two cultures” of science and humanism—as CP Snow famously described in his lecture on “Two Cultures and the Scientific Revolution” (Snow, 1959). By doing so, we also seek to bridge the two other dichotomies that limit an examination of datafication and is interaction with policy from various angles: the divide between practice and scholarship; and between private and public…

So these are our principles: scholarly, pragmatic, open-minded, interdisciplinary, focused on actionable intelligence, and, most of all, innovative in how we will share insight and pushing at the boundaries of what we already know and what already exists. We are excited to launch Data & Policy with the support of Cambridge University Press and University College London, and we’re looking for partners to help us build it as a resource for the community. If you’re reading this manifesto it means you have at least a passing interest in the subject; we hope you will be part of the conversation….(More)”.

Techno-optimism and policy-pessimism in the public sector big data debate


Paper by Simon Vydra and Bram Klievink: “Despite great potential, high hopes and big promises, the actual impact of big data on the public sector is not always as transformative as the literature would suggest. In this paper, we ascribe this predicament to an overly strong emphasis the current literature places on technical-rational factors at the expense of political decision-making factors. We express these two different emphases as two archetypical narratives and use those to illustrate that some political decision-making factors should be taken seriously by critiquing some of the core ‘techno-optimist’ tenets from a more ‘policy-pessimist’ angle.

In the conclusion we have these two narratives meet ‘eye-to-eye’, facilitating a more systematized interrogation of big data promises and shortcomings in further research, paying appropriate attention to both technical-rational and political decision-making factors. We finish by offering a realist rejoinder of these two narratives, allowing for more context-specific scrutiny and balancing both technical-rational and political decision-making concerns, resulting in more realistic expectations about using big data for policymaking in practice….(More)”.

How to use data for good — 5 priorities and a roadmap


Stefaan Verhulst at apolitical: “…While the overarching message emerging from these case studies was promising, several barriers were identified that if not addressed systematically could undermine the potential of data science to address critical public needs and limit the opportunity to scale the practice more broadly.

Below we summarise the five priorities that emerged through the workshop for the field moving forward.

1. Become People-Centric

Much of the data currently used for drawing insights involve or are generated by people.

These insights have the potential to impact people’s lives in many positive and negative ways. Yet, the people and the communities represented in this data are largely absent when practitioners design and develop data for social good initiatives.

To ensure data is a force for positive social transformation (i.e., they address real people’s needs and impact lives in a beneficiary way), we need to experiment with new ways to engage people at the design, implementation, and review stage of data initiatives beyond simply asking for their consent.

(Photo credit: Image from the people-led innovation report)

As we explain in our People-Led Innovation methodology, different segments of people can play multiple roles ranging from co-creation to commenting, reviewing and providing additional datasets.

The key is to ensure their needs are front and center, and that data science for social good initiatives seek to address questions related to real problems that matter to society-at-large (a key concern that led The GovLab to instigate 100 Questions Initiative).

2. Establish Data About the Use of Data (for Social Good)

Many data for social good initiatives remain fledgling.

As currently designed, the field often struggles with translating sound data projects into positive change. As a result, many potential stakeholders—private sector and government “owners” of data as well as public beneficiaries—remain unsure about the value of using data for social good, especially against the background of high risks and transactions costs.

The field needs to overcome such limitations if data insights and its benefits are to spread. For that, we need hard evidence about data’s positive impact. Ironically, the field is held back by an absence of good data on the use of data—a lack of reliable empirical evidence that could guide new initiatives.

The field needs to prioritise developing a far more solid evidence base and “business case” to move data for social good from a good idea to reality.

3. Develop End-to-End Data Initiatives

Too often, data for social good focus on the “data-to-knowledge” pipeline without focusing on how to move “knowledge into action.”

As such, the impact remains limited and many efforts never reach an audience that can actually act upon the insights generated. Without becoming more sophisticated in our efforts to provide end-to-end projects and taking “data from knowledge to action,” the positive impact of data will be limited….

4. Invest in Common Trust and Data Steward Mechanisms 

For data for social good initiatives (including data collaboratives) to flourish and scale, there must be substantial trust between all parties involved; and amongst the public-at-large.

Establishing such a platform of trust requires each actor to invest in developing essential trust mechanisms such as data governance structures, contracts, and dispute resolution methods. Today, designing and establishing these mechanisms take tremendous time, energy, and expertise. These high transaction costs result from the lack of common templates and the need to each time design governance structures from scratch…

5. Build Bridges Across Cultures

As C.P. Snow famously described in his lecture on “Two Cultures and the Scientific Revolution,” we must bridge the “two cultures” of science and humanism if we are to solve the world’s problems….

To implement these five priorities we will need experimentation at the operational but also institutional level. This involves the establishment of “data stewards” within organisations that can accelerate data for social good initiative in a responsible manner integrating the five priorities above….(More)”

Humans and Big Data: New Hope? Harnessing the Power of Person-Centred Data Analytics


Paper by Carmel Martin, Keith Stockman and Joachim P. Sturmberg: “Big data provide the hope of major health innovation and improvement. However, there is a risk of precision medicine based on predictive biometrics and service metrics overwhelming anticipatory human centered sense-making, in the fuzzy emergence of personalized (big data) medicine. This is a pressing issue, given the paucity of individual sense-making data approaches. A human-centric model is described to address the gap in personal particulars and experiences in individual health journeys. The Patient Journey Record System (PaJR) was developed to improve human-centric healthcare by harnessing the power of person-centred data analytics using complexity theory, iterative health services and information systems applications over a 10 year period. PaJR is a web-based service supporting usually bi-weekly telephone calls by care guides to individuals at risk of readmissions.

This chapter describes a case study of the timing and context of readmissions using human (biopsychosocial) particular data which is based on individual experiences and perceptions with differing patterns of instability. This Australian study, called MonashWatch, is a service pilot using the PaJR system in the Dandenong Hospital urban catchment area of the Monash Health network. State public hospital big data – the Victorian HealthLinks Chronic Care algorithm provides case finding for high risk of readmission based on disease and service metrics. Monash Watch was actively monitoring 272 of 376 intervention patients, with 195 controls over 22 months (ongoing) at the time of the study.

Three randomly selected intervention cases describe a dynamic interplay of self-reported change in health and health care, medication, drug and alcohol use, social support structure. While the three cases were at similar predicted risk initially, their cases represented different statistically different time series configurations and admission patterns. Fluctuations in admission were associated with (mal)alignment of bodily health with psychosocial and environmental influences. However human interpretation was required to make sense of the patterns as presented by the multiple levels of data.

A human-centric model and framework for health journey monitoring illustrates the potential for ‘small’ personal experience data to inform clinical care in the era of big data predominantly based on biometrics and medical industrial process. ….(More)”.

Ethics of identity in the time of big data


Paper by James Brusseau in First Monday: “Compartmentalizing our distinct personal identities is increasingly difficult in big data reality. Pictures of the person we were on past vacations resurface in employers’ Google searches; LinkedIn which exhibits our income level is increasingly used as a dating web site. Whether on vacation, at work, or seeking romance, our digital selves stream together.

One result is that a perennial ethical question about personal identity has spilled out of philosophy departments and into the real world. Ought we possess one, unified identity that coherently integrates the various aspects of our lives, or, incarnate deeply distinct selves suited to different occasions and contexts? At bottom, are we one, or many?

The question is not only palpable today, but also urgent because if a decision is not made by us, the forces of big data and surveillance capitalism will make it for us by compelling unity. Speaking in favor of the big data tendency, Facebook’s Mark Zuckerberg promotes the ethics of an integrated identity, a single version of selfhood maintained across diverse contexts and human relationships.

This essay goes in the other direction by sketching two ethical frameworks arranged to defend our compartmentalized identities, which amounts to promoting the dis-integration of our selves. One framework connects with natural law, the other with language, and both aim to create a sense of selfhood that breaks away from its own past, and from the unifying powers of big data technology….(More)”.

GIS and the 2020 Census


ESRI:GIS and the 2020 Census: Modernizing Official Statistics provides statistical organizations with the most recent GIS methodologies and technological tools to support census workers’ needs at all the stages of a census. Learn how to plan and carry out census work with GIS using new technologies for field data collection and operations management. International case studies illustrate concepts in practice….(More)”.

As Surveys Falter Big Data Polling Narrows Our Societal Understanding


Kalev Leetaru at Forbes: “One of the most talked-about stories in the world of polling and survey research in recent years has been the gradual death of survey response rates and the reliability of those insights….

The online world’s perceived anonymity has offered some degree of reprieve in which online polls and surveys have often bested traditional approaches in assessing views towards society’s most controversial issues. Yet, here as well increasing public understanding of phishing and online safety are ever more problematic.

The answer has been the rise of “big data” analysis of society’s digital exhaust to fill in the gaps….

Is it truly the same answer though?

Constructing and conducting a well-designed survey means being able to ask the public exactly the questions of interest. Most importantly, it entails being able to ensure representative demographics of respondents.

An online-only poll is unlikely to accurately capture the perspectives of the three quarters of the earth’s population that the digital revolution has left behind. Even within the US, social media platforms are extraordinarily skewed.

The far greater problem is that society’s data exhaust is rarely a perfect match for the questions of greatest interest to policymakers and public.

Cellphone mobility records can offer an exquisitely detailed look at how the people of a city go about their daily lives, but beneath all that blinding light are the invisible members of society not deemed valuable to advertisers and thus not counted. Even for the urban society members whose phones are their ever-present companions, mobility data only goes so far. It can tell us that occupants of a particular part of the city during the workday spend their evenings in a particular part of the city, allowing us to understand their work/life balance, but it offers few insights into their political leanings.

One of the greatest challenges of today’s “big data” surveying is that it requires us to narrow our gaze to only those questions which can be easily answered from the data at hand.

Much as AI’s crisis of bias comes from the field’s steadfast refusal to pay for quality data, settling for highly biased free data, so too has “big data” surveying limited itself largely to datasets it can freely and easily acquire.

The result is that with traditional survey research, we are free to ask the precise questions we are most interested in. With data exhaust research, we must imperfectly shoehorn our questions into the few available metrics. With sufficient creativity it is typically possible to find some way of proxying the given question, but the resulting proxies may be highly unstable, with little understanding of when and where they may fail.

Much like how the early rise of the cluster computing era caused “big data” researchers to limit the questions they asked of their data to just those they could fit into a set of tiny machines, so too has the era of data exhaust surveying forced us to greatly restrict our understanding of society.

Most dangerously, however, big data surveying implicitly means we are measuring only the portion of society our vast commercial surveillance state cares about.

In short, we are only able to measure those deemed of greatest interest to advertisers and thus the most monetizable.

Putting this all together, the decline of traditional survey research has led to the rise of “big data” analysis of society’s data exhaust. Instead of giving us an unprecedented new view into the heartbeat of daily life, this reliance on the unintended output of our digital lives has forced researchers to greatly narrow the questions they can explore and severely skews them to the most “monetizable” portions of society.

In the end, the shift of societal understanding from precision surveys to the big data revolution has led not to an incredible new understanding of what makes us tick, but rather a far smaller, less precise and less accurate view than ever before, just our need to understand ourselves has never been greater….(More)”.

A weather tech startup wants to do forecasts based on cell phone signals


Douglas Heaven at MIT Technology Review: “On 14 April more snow fell on Chicago than it had in nearly 40 years. Weather services didn’t see it coming: they forecast one or two inches at worst. But when the late winter snowstorm came it caused widespread disruption, dumping enough snow that airlines had to cancel more than 700 flights across all of the city’s airports.

One airline did better than most, however. Instead of relying on the usual weather forecasts, it listened to ClimaCell – a Boston-based “weather tech” start-up that claims it can predict the weather more accurately than anyone else. According to the company, its correct forecast of the severity of the coming snowstorm allowed the airline to better manage its schedules and minimize losses due to delays and diversions. 

Founded in 2015, ClimaCell has spent the last few years developing the technology and business relationships that allow it to tap into millions of signals from cell phones and other wireless devices around the world. It uses the quality of these signals as a proxy for local weather conditions, such as precipitation and air quality. It also analyzes images from street cameras. It is offering a weather forecasting service to subscribers that it claims is 60 percent more accurate than that of existing providers, such as NOAA.

The internet of weather

The approach makes sense, in principle. Other forecasters use proxies, such as radar signals. But by using information from millions of everyday wireless devices, ClimaCell claims it has a far more fine-grained view of most of the globe than other forecasters get from the existing network of weather sensors, which range from ground-based devices to satellites. (ClimaCell also taps into these, too.)…(More)”.

Introducing the Contractual Wheel of Data Collaboration


Blog by Andrew Young and Stefaan Verhulst: “Earlier this year we launched the Contracts for Data Collaboration (C4DC) initiative — an open collaborative with charter members from The GovLab, UN SDSN Thematic Research Network on Data and Statistics (TReNDS), University of Washington and the World Economic Forum. C4DC seeks to address the inefficiencies of developing contractual agreements for public-private data collaboration by informing and guiding those seeking to establish a data collaborative by developing and making available a shared repository of relevant contractual clauses taken from existing legal agreements. Today TReNDS published “Partnerships Founded on Trust,” a brief capturing some initial findings from the C4DC initiative.

The Contractual Wheel of Data Collaboration [beta]

The Contractual Wheel of Data Collaboration [beta] — Stefaan G. Verhulst and Andrew Young, The GovLab

As part of the C4DC effort, and to support Data Stewards in the private sector and decision-makers in the public and civil sectors seeking to establish Data Collaboratives, The GovLab developed the Contractual Wheel of Data Collaboration [beta]. The Wheel seeks to capture key elements involved in data collaboration while demystifying contracts and moving beyond the type of legalese that can create confusion and barriers to experimentation.

The Wheel was developed based on an assessment of existing legal agreements, engagement with The GovLab-facilitated Data Stewards Network, and analysis of the key elements of our Data Collaboratives Methodology. It features 22 legal considerations organized across 6 operational categories that can act as a checklist for the development of a legal agreement between parties participating in a Data Collaborative:…(More)”.