Strategies and limitations in app usage and human mobility

Paper by Marco De Nadai, Angelo Cardoso, Antonio Lima, Bruno Lepri, and Nuria Oliver: “Cognition has been found to constrain several aspects of human behaviour, such as the number of friends and the number of favourite places a person keeps stable over time. this limitation has been empirically defined in the physical and social spaces. But do people exhibit similar constraints in the digital space? We address this question through the analysis of pseudonymised mobility and mobile application (app) usage data of 400,000 individuals in a European country for six months. Despite the enormous heterogeneity of apps usage, we find that individuals exhibit a conserved capacity that limits the number of applications they regularly use. Moreover, we find that this capacity steadily decreases with age, as does the capacity in the physical space but with more complex dynamics. Even though people might have the same capacity, applications get added and removed over time.

In this respect, we identify two profiles of individuals: app keepers and explorers, which differ in their stable (keepers) vs exploratory (explorers) behaviour regarding their use of mobile applications. Finally, we show that the capacity of applications predicts mobility capacity and vice-versa. By contrast, the behaviour of keepers and explorers may considerably vary across the two domains. Our empirical findings provide an intriguing picture linking human behaviour in the physical and digital worlds which bridges research studies from Computer Science, Social Physics and Computational Social Sciences…(More)”.

The personification of big data

Paper by Stevenson, Phillip Douglas and Mattson, Christopher Andrew: “Organizations all over the world, both national and international, gather demographic data so that the progress of nations and peoples can be tracked. This data is often made available to the public in the form of aggregated national level data or individual responses (microdata). Product designers likewise conduct surveys to better understand their customer and create personas. Personas are archetypes of the individuals who will use, maintain, sell or otherwise be affected by the products created by designers. Personas help designers better understand the person the product is designed for. Unfortunately, the process of collecting customer information and creating personas is often a slow and expensive process.

In this paper, we introduce a new method of creating personas, leveraging publicly available databanks of both aggregated national level and information on individuals in the population. A computational persona generator is introduced that creates a population of personas that mirrors a real population in terms of size and statistics. Realistic individual personas are filtered from this population for use in product development…(More)”.

The value of data in Canada: Experimental estimates

Statistics Canada: “As data and information take on a far more prominent role in Canada and, indeed, all over the world, data, databases and data science have become a staple of modern life. When the electricity goes out, Canadians are as much in search of their data feed as they are food and heat. Consumers are using more and more data that is embodied in the products they buy, whether those products are music, reading material, cars and other appliances, or a wide range of other goods and services. Manufacturers, merchants and other businesses depend increasingly on the collection, processing and analysis of data to make their production processes more efficient and to drive their marketing strategies.

The increasing use of and investment in all things data is driving economic growth, changing the employment landscape and reshaping how and from where we buy and sell goods. Yet the rapid rise in the use and importance of data is not well measured in the existing statistical system. Given the ‘lack of data on data’, Statistics Canada has initiated new research to produce a first set of estimates of the value of data, databases and data science. The development of these estimates benefited from collaboration with the Bureau of Economic Analysis in the United States and the Organisation for Economic Co-operation and Development.

In 2018, Canadian investment in data, databases and data science was estimated to be as high as $40 billion. This was greater than the annual investment in industrial machinery, transportation equipment, and research and development and represented approximately 12% of total non-residential investment in 2018….

Statistics Canada recently released a conceptual framework outlining how one might measure the economic value of data, databases and data science. Thanks to this new framework, the growing role of data in Canada can be measured through time. This framework is described in a paper that was released in The Daily on June 24, 2019 entitled “Measuring investments in data, databases and data science: Conceptual framework.” That paper describes the concept of an ‘information chain’ in which data are derived from everyday observations, databases are constructed from data, and data science creates new knowledge by analyzing the contents of databases….(More)”.

Stop Surveillance Humanitarianism

Mark Latonero at The New York Times: “A standoff between the United Nations World Food Program and Houthi rebels in control of the capital region is threatening the lives of hundreds of thousands of civilians in Yemen.

Alarmed by reports that food is being diverted to support the rebels, the aid program is demanding that Houthi officials allow them to deploy biometric technologies like iris scans and digital fingerprints to monitor suspected fraud during food distribution.

The Houthis have reportedly blocked food delivery, painting the biometric effort as an intelligence operation, and have demanded access to the personal data on beneficiaries of the aid. The impasse led the aid organization to the decision last month to suspend food aid to parts of the starving population — once thought of as a last resort — unless the Houthis allow biometrics.

With program officials saying their staff is prevented from doing its essential jobs, turning to a technological solution is tempting. But biometrics deployed in crises can lead to a form of surveillance humanitarianism that can exacerbate risks to privacy and security.

By surveillance humanitarianism, I mean the enormous data collection systems deployed by aid organizations that inadvertently increase the vulnerability of people in urgent need….(More)”.

The Lives and After Lives of Data

Paper by Christine L. Borgman: “The most elusive term in data science is ‘data.’ While often treated as objects to be computed upon, data is a theory-laden concept with a long history. Data exist within knowledge infrastructures that govern how they are created, managed, and interpreted. By comparing models of data life cycles, implicit assumptions about data become apparent. In linear models, data pass through stages from beginning to end of life, which suggest that data can be recreated as needed. Cyclical models, in which data flow in a virtuous circle of uses and reuses, are better suited for irreplaceable observational data that may retain value indefinitely. In astronomy, for example, observations from one generation of telescopes may become calibration and modeling data for the next generation, whether digital sky surveys or glass plates. The value and reusability of data can be enhanced through investments in knowledge infrastructures, especially digital curation and preservation. Determining what data to keep, why, how, and for how long, is the challenge of our day…(More)”.

Soon, satellites will be able to watch you everywhere all the time

Christopher Beam at MIT Technology Review: “In 2013, police in Grants Pass, Oregon, got a tip that a man named Curtis W. Croft had been illegally growing marijuana in his backyard. So they checked Google Earth. Indeed, the four-month-old satellite image showed neat rows of plants growing on Croft’s property. The cops raided his place and seized 94 plants.

In 2018, Brazilian police in the state of Amapá used real-time satellite imagery to detect a spot where trees had been ripped out of the ground. When they showed up, they discovered that the site was being used to illegally produce charcoal, and arrested eight people in connection with the scheme.

Chinese government officials have denied or downplayed the existence of Uighur reeducation camps in Xinjiang province, portraying them as “vocational schools.” But human rights activists have used satellite imagery to show that many of the “schools” are surrounded by watchtowers and razor wire.

Every year, commercially available satellite images are becoming sharper and taken more frequently. In 2008, there were 150 Earth observation satellites in orbit; by now there are 768. Satellite companies don’t offer 24-hour real-time surveillance, but if the hype is to be believed, they’re getting close. Privacy advocates warn that innovation in satellite imagery is outpacing the US government’s (to say nothing of the rest of the world’s) ability to regulate the technology. Unless we impose stricter limits now, they say, one day everyone from ad companies to suspicious spouses to terrorist organizations will have access to tools previously reserved for government spy agencies. Which would mean that at any given moment, anyone could be watching anyone else.

The images keep getting clearer

Commercial satellite imagery is currently in a sweet spot: powerful enough to see a car, but not enough to tell the make and model; collected frequently enough for a farmer to keep tabs on crops’ health, but not so often that people could track the comings and goings of a neighbor. This anonymity is deliberate. US federal regulations limit images taken by commercial satellites to a resolution of 25 centimeters, or about the length of a man’s shoe….(More)”.

The Ethics of Big Data Applications in the Consumer Sector

Paper by Markus Christen et al : “Business applications relying on processing of large amounts of heterogeneous data (Big Data) are considered to be key drivers of innovation in the digital economy. However, these applications also pose ethical issues that may undermine the credibility of data-driven businesses. In our contribution, we discuss ethical problems that are associated with Big Data such as: How are core values like autonomy, privacy, and solidarity affected in a Big Data world? Are some data a public good? Or: Are we obliged to divulge personal data to a certain degree in order to make the society more secure or more efficient?

We answer those questions by first outlining the ethical topics that are discussed in the scientific literature and the lay media using a bibliometric approach. Second, referring to the results of expert interviews and workshops with practitioners, we identify core norms and values affected by Big Data applications—autonomy, equality, fairness, freedom, privacy, property-rights, solidarity, and transparency—and outline how they are exemplified in examples of Big Data consumer applications, for example, in terms of informational self-determination, non-discrimination, or free opinion formation. Based on use cases such as personalized advertising, individual pricing, or credit risk management we discuss the process of balancing such values in order to identify legitimate, questionable, and unacceptable Big Data applications from an ethics point of view. We close with recommendations on how practitioners working in applied data science can deal with ethical issues of Big Data….(More)”.

Data & Policy: A new venue to study and explore policy–data interaction

Opening editorial by Stefaan G. Verhulst, Zeynep Engin and Jon Crowcroft: “…Policy–data interactions or governance initiatives that use data have been the exception rather than the norm, isolated prototypes and trials rather than an indication of real, systemic change. There are various reasons for the generally slow uptake of data in policymaking, and several factors will have to change if the situation is to improve. ….

  • Despite the number of successful prototypes and small-scale initiatives, policy makers’ understanding of data’s potential and its value proposition generally remains limited (Lutes, 2015). There is also limited appreciation of the advances data science has made the last few years. This is a major limiting factor; we cannot expect policy makers to use data if they do not recognize what data and data science can do.
  • The recent (and justifiable) backlash against how certain private companies handle consumer data has had something of a reverse halo effect: There is a growing lack of trust in the way data is collected, analyzed, and used, and this often leads to a certain reluctance (or simply risk-aversion) on the part of officials and others (Engin, 2018).
  • Despite several high-profile open data projects around the world, much (probably the majority) of data that could be helpful in governance remains either privately held or otherwise hidden in silos (Verhulst and Young, 2017b). There remains a shortage not only of data but, more specifically, of high-quality and relevant data.
  • With few exceptions, the technical capacities of officials remain limited, and this has obviously negative ramifications for the potential use of data in governance (Giest, 2017).
  • It’s not just a question of limited technical capacities. There is often a vast conceptual and values gap between the policy and technical communities (Thompson et al., 2015; Uzochukwu et al., 2016); sometimes it seems as if they speak different languages. Compounding this difference in world views is the fact that the two communities rarely interact.
  • Yet, data about the use and evidence of the impact of data remain sparse. The impetus to use more data in policy making is stymied by limited scholarship and a weak evidential basis to show that data can be helpful and how. Without such evidence, data advocates are limited in their ability to make the case for more data initiatives in governance.
  • Data are not only changing the way policy is developed, but they have also reopened the debate around theory- versus data-driven methods in generating scientific knowledge (Lee, 1973; Kitchin, 2014; Chivers, 2018; Dreyfuss, 2017) and thus directly questioning the evidence base to utilization and implementation of data within policy making. A number of associated challenges are being discussed, such as: (i) traceability and reproducibility of research outcomes (due to “black box processing”); (ii) the use of correlation instead of causation as the basis of analysis, biases and uncertainties present in large historical datasets that cause replication and, in some cases, amplification of human cognitive biases and imperfections; and (iii) the incorporation of existing human knowledge and domain expertise into the scientific knowledge generation processes—among many other topics (Castelvecchi, 2016; Miller and Goodchild, 2015; Obermeyer and Emanuel, 2016; Provost and Fawcett, 2013).
  • Finally, we believe that there should be a sound under-pinning a new theory of what we call Policy–Data Interactions. To date, in reaction to the proliferation of data in the commercial world, theories of data management,1 privacy,2 and fairness3 have emerged. From the Human–Computer Interaction world, a manifesto of principles of Human–Data Interaction (Mortier et al., 2014) has found traction, which intends reducing the asymmetry of power present in current design considerations of systems of data about people. However, we need a consistent, symmetric approach to consideration of systems of policy and data, how they interact with one another.

All these challenges are real, and they are sticky. We are under no illusions that they will be overcome easily or quickly….

During the past four conferences, we have hosted an incredibly diverse range of dialogues and examinations by key global thought leaders, opinion leaders, practitioners, and the scientific community (Data for Policy, 2015201620172019). What became increasingly obvious was the need for a dedicated venue to deepen and sustain the conversations and deliberations beyond the limitations of an annual conference. This leads us to today and the launch of Data & Policy, which aims to confront and mitigate the barriers to greater use of data in policy making and governance.

Data & Policy is a venue for peer-reviewed research and discussion about the potential for and impact of data science on policy. Our aim is to provide a nuanced and multistranded assessment of the potential and challenges involved in using data for policy and to bridge the “two cultures” of science and humanism—as CP Snow famously described in his lecture on “Two Cultures and the Scientific Revolution” (Snow, 1959). By doing so, we also seek to bridge the two other dichotomies that limit an examination of datafication and is interaction with policy from various angles: the divide between practice and scholarship; and between private and public…

So these are our principles: scholarly, pragmatic, open-minded, interdisciplinary, focused on actionable intelligence, and, most of all, innovative in how we will share insight and pushing at the boundaries of what we already know and what already exists. We are excited to launch Data & Policy with the support of Cambridge University Press and University College London, and we’re looking for partners to help us build it as a resource for the community. If you’re reading this manifesto it means you have at least a passing interest in the subject; we hope you will be part of the conversation….(More)”.

Techno-optimism and policy-pessimism in the public sector big data debate

Paper by Simon Vydra and Bram Klievink: “Despite great potential, high hopes and big promises, the actual impact of big data on the public sector is not always as transformative as the literature would suggest. In this paper, we ascribe this predicament to an overly strong emphasis the current literature places on technical-rational factors at the expense of political decision-making factors. We express these two different emphases as two archetypical narratives and use those to illustrate that some political decision-making factors should be taken seriously by critiquing some of the core ‘techno-optimist’ tenets from a more ‘policy-pessimist’ angle.

In the conclusion we have these two narratives meet ‘eye-to-eye’, facilitating a more systematized interrogation of big data promises and shortcomings in further research, paying appropriate attention to both technical-rational and political decision-making factors. We finish by offering a realist rejoinder of these two narratives, allowing for more context-specific scrutiny and balancing both technical-rational and political decision-making concerns, resulting in more realistic expectations about using big data for policymaking in practice….(More)”.

How to use data for good — 5 priorities and a roadmap

Stefaan Verhulst at apolitical: “…While the overarching message emerging from these case studies was promising, several barriers were identified that if not addressed systematically could undermine the potential of data science to address critical public needs and limit the opportunity to scale the practice more broadly.

Below we summarise the five priorities that emerged through the workshop for the field moving forward.

1. Become People-Centric

Much of the data currently used for drawing insights involve or are generated by people.

These insights have the potential to impact people’s lives in many positive and negative ways. Yet, the people and the communities represented in this data are largely absent when practitioners design and develop data for social good initiatives.

To ensure data is a force for positive social transformation (i.e., they address real people’s needs and impact lives in a beneficiary way), we need to experiment with new ways to engage people at the design, implementation, and review stage of data initiatives beyond simply asking for their consent.

(Photo credit: Image from the people-led innovation report)

As we explain in our People-Led Innovation methodology, different segments of people can play multiple roles ranging from co-creation to commenting, reviewing and providing additional datasets.

The key is to ensure their needs are front and center, and that data science for social good initiatives seek to address questions related to real problems that matter to society-at-large (a key concern that led The GovLab to instigate 100 Questions Initiative).

2. Establish Data About the Use of Data (for Social Good)

Many data for social good initiatives remain fledgling.

As currently designed, the field often struggles with translating sound data projects into positive change. As a result, many potential stakeholders—private sector and government “owners” of data as well as public beneficiaries—remain unsure about the value of using data for social good, especially against the background of high risks and transactions costs.

The field needs to overcome such limitations if data insights and its benefits are to spread. For that, we need hard evidence about data’s positive impact. Ironically, the field is held back by an absence of good data on the use of data—a lack of reliable empirical evidence that could guide new initiatives.

The field needs to prioritise developing a far more solid evidence base and “business case” to move data for social good from a good idea to reality.

3. Develop End-to-End Data Initiatives

Too often, data for social good focus on the “data-to-knowledge” pipeline without focusing on how to move “knowledge into action.”

As such, the impact remains limited and many efforts never reach an audience that can actually act upon the insights generated. Without becoming more sophisticated in our efforts to provide end-to-end projects and taking “data from knowledge to action,” the positive impact of data will be limited….

4. Invest in Common Trust and Data Steward Mechanisms 

For data for social good initiatives (including data collaboratives) to flourish and scale, there must be substantial trust between all parties involved; and amongst the public-at-large.

Establishing such a platform of trust requires each actor to invest in developing essential trust mechanisms such as data governance structures, contracts, and dispute resolution methods. Today, designing and establishing these mechanisms take tremendous time, energy, and expertise. These high transaction costs result from the lack of common templates and the need to each time design governance structures from scratch…

5. Build Bridges Across Cultures

As C.P. Snow famously described in his lecture on “Two Cultures and the Scientific Revolution,” we must bridge the “two cultures” of science and humanism if we are to solve the world’s problems….

To implement these five priorities we will need experimentation at the operational but also institutional level. This involves the establishment of “data stewards” within organisations that can accelerate data for social good initiative in a responsible manner integrating the five priorities above….(More)”