Data for Development: What’s next? Concepts, trends and recommendations


Report by the WebFoundation: “The exponential growth of data provides powerful new ways for governments and companies to understand and respond to challenges and opportunities. This report, Data for Development: What’s next, investigates how organisations working in international development can leverage the growing quantity and variety of data to improve their investments and projects so that they better meet people’s needs.

Investigating the state of data for development and identifying emerging data trends, the study provides recommendations to support German development cooperation actors seeking to integrate data strategies and investments in their work. These insights can guide any organisation seeking to use data to enhance their development work.

The research considers four types of data: (1) big data, (2) open data, (3) citizen-generated data and (4) real-time data, and examines how they are currently being used in development-related policy-making and how they might lead to better development outcomes….(More)”.

How to Make A.I. That’s Good for People


Fei-Fei Li in the New York Times: “For a field that was not well known outside of academia a decade ago, artificial intelligence has grown dizzyingly fast. Tech companies from Silicon Valley to Beijing are betting everything on it, venture capitalists are pouring billions into research and development, and start-ups are being created on what seems like a daily basis. If our era is the next Industrial Revolution, as many claim, A.I. is surely one of its driving forces.

It is an especially exciting time for a researcher like me. When I was a graduate student in computer science in the early 2000s, computers were barely able to detect sharp edges in photographs, let alone recognize something as loosely defined as a human face. But thanks to the growth of big data, advances in algorithms like neural networks and an abundance of powerful computer hardware, something momentous has occurred: A.I. has gone from an academic niche to the leading differentiator in a wide range of industries, including manufacturing, health care, transportation and retail.

I worry, however, that enthusiasm for A.I. is preventing us from reckoning with its looming effects on society. Despite its name, there is nothing “artificial” about this technology — it is made by humans, intended to behave like humans and affects humans. So if we want it to play a positive role in tomorrow’s world, it must be guided by human concerns.

I call this approach “human-centered A.I.” It consists of three goals that can help responsibly guide the development of intelligent machines.

First, A.I. needs to reflect more of the depth that characterizes our own intelligence….

No technology is more reflective of its creators than A.I. It has been said that there are no “machine” values at all, in fact; machine values arehuman values. A human-centered approach to A.I. means these machines don’t have to be our competitors, but partners in securing our well-being. However autonomous our technology becomes, its impact on the world — for better or worse — will always be our responsibility….(More).

Infection forecasts powered by big data


Michael Eisenstein at Nature: “…The good news is that the present era of widespread access to the Internet and digital health has created a rich reservoir of valuable data for researchers to dive into….By harvesting and combining these streams of big data with conventional ways of monitoring infectious diseases, the public-health community could gain fresh powers to catch and curb emerging outbreaks before they rage out of control.

Going viral

Data scientists at Google were the first to make a major splash using data gathered online to track infectious diseases. The Google Flu Trends algorithm, launched in November 2008, combed through hundreds of billions of users’ queries on the popular search engine to look for small increases in flu-related terms such as symptoms or vaccine availability. Initial data suggested that Google Flu Trends could accurately map the incidence of flu with a lag of roughly one day. “It was a very exciting use of these data for the purpose of public health,” says Brownstein. “It really did start a whole revolution and new field of work in query data.”

Unfortunately, Google Flu Trends faltered when it mattered the most, completely missing the onset in April 2009 of the H1N1 pandemic. The algorithm also ran into trouble later on in the pandemic. It had been trained against seasonal fluctuations of flu, says Viboud, but people’s behaviour changed in the wake of panic fuelled by media reports — and that threw off Google’s data. …

Nevertheless, its work with Internet usage data was inspirational for infectious-disease researchers. A subsequent study from a team led by Cecilia Marques-Toledo at the Federal University of Minas Gerais in Belo Horizonte, Brazil, used Twitter to get high-resolution data on the spread of dengue fever in the country. The researchers could quickly map new cases to specific cities and even predict where the disease might spread to next (C. A. Marques-Toledo et al. PLoS Negl. Trop. Dis. 11, e0005729; 2017). Similarly, Brownstein and his colleagues were able to use search data from Google and Twitter to project the spread of Zika virus in Latin America several weeks before formal outbreak declarations were made by public-health officials. Both Internet services are used widely, which makes them data-rich resources. But they are also proprietary systems for which access to data is controlled by a third party; for that reason, Generous and his colleagues have opted instead to make use of search data from Wikipedia, which is open source. “You can get the access logs, and how many people are viewing articles, which serves as a pretty good proxy for search interest,” he says.

However, the problems that sank Google Flu Trends still exist….Additionally, online activity differs for infectious conditions with a social stigma such as syphilis or AIDS, because people who are or might be affected are more likely to be concerned about privacy. Appropriate search-term selection is essential: Generous notes that initial attempts to track flu on Twitter were confounded by irrelevant tweets about ‘Bieber fever’ — a decidedly non-fatal condition affecting fans of Canadian pop star Justin Bieber.

Alternatively, researchers can go straight to the source — by using smartphone apps to ask people directly about their health. Brownstein’s team has partnered with the Skoll Global Threats Fund to develop an app called Flu Near You, through which users can voluntarily report symptoms of infection and other information. “You get more detailed demographics about age and gender and vaccination status — things that you can’t get from other sources,” says Brownstein. Ten European Union member states are involved in a similar surveillance programme known as Influenzanet, which has generally maintained 30,000–40,000 active users for seven consecutive flu seasons. These voluntary reporting systems are particularly useful for diseases such as flu, for which many people do not bother going to the doctor — although it can be hard to persuade people to participate for no immediate benefit, says Brownstein. “But we still get a good signal from the people that are willing to be a part of this.”…(More)”.

Data-Driven Regulation and Governance in Smart Cities


Chapter by Sofia Ranchordas and Abram Klop in Berlee, V. Mak, E. Tjong Tjin Tai (Eds), Research Handbook on Data Science and Law (Edward Elgar, 2018): “This paper discusses the concept of data-driven regulation and governance in the context of smart cities by describing how these urban centres harness these technologies to collect and process information about citizens, traffic, urban planning or waste production. It describes how several smart cities throughout the world currently employ data science, big data, AI, Internet of Things (‘IoT’), and predictive analytics to improve the efficiency of their services and decision-making.

Furthermore, this paper analyses the legal challenges of employing these technologies to influence or determine the content of local regulation and governance. It explores in particular three specific challenges: the disconnect between traditional administrative law frameworks and data-driven regulation and governance, the effects of the privatization of public services and citizen needs due to the growing outsourcing of smart cities technologies to private companies; and the limited transparency and accountability that characterizes data-driven administrative processes. This paper draws on a review of interdisciplinary literature on smart cities and offers illustrations of data-driven regulation and governance practices from different jurisdictions….(More)”.

No One Owns Data


Paper by Lothar Determann: “Businesses, policy makers, and scholars are calling for property rights in data. They currently focus particularly on the vast amounts of data generated by connected cars, industrial machines, artificial intelligence, toys and other devices on the Internet of Things (IoT). This data is personal to numerous parties who are associated with a connected device, for example, the driver of a connected car, its owner and passengers, as well as other traffic participants. Manufacturers, dealers, independent providers of auto parts and services, insurance companies, law enforcement agencies and many others are also interested in this data. Various parties are actively staking their claims to data on the Internet of Things, as they are mining data, the fuel of the digital economy.

Stakeholders in digital markets often frame claims, negotiations and controversies regarding data access as one of ownership. Businesses regularly assert and demand that they own data. Individual data subjects also assume that they own data about themselves. Policy makers and scholars focus on how to redistribute ownership rights to data. Yet, upon closer review, it is very questionable whether data is—or should be—subject to any property rights. This article unambiguously answers the question in the negative, both with respect to existing law and future lawmaking, in the United States as in the European Union, jurisdictions with notably divergent attitudes to privacy, property and individual freedoms….

The article begins with a brief review of the current landscape of the Internet of Things notes explosive growth of data pools generated by connected devices, artificial intelligence, big data analytics tools and other information technologies. Part 1 lays the foundation for examining concrete current legal and policy challenges in the remainder of the article. Part 2 supplies conceptual differentiation and definitions with respect to “data” and “information” as the subject of rights and interests. Distinctions and definitional clarity serve as the basis for examining the purposes and reach of existing property laws in Part 3, including real property, personal property and intellectual property laws. Part 4 analyzes the effect of data-related laws that do not grant property rights. Part 5 examines how the interests of the various stakeholders are protected or impaired by the current framework of data-related laws to identify potential gaps that could warrant additional property rights. Part 6 examines policy considerations for and against property rights in data. Part 7 concludes that no one owns data and no one should own data….(More)”.

Quality of life, big data and the power of statistics


Paper by Shivam Gupta in Statistics & Probability Letters: “Quality of life (QoL) is tied to the perception of ‘meaning’. The quest for meaning is central to the human condition, and we are brought in touch with a sense of meaning when we reflect on what we have created, loved, believed in or left as a legacy (Barcaccia, 2013). QoL is associated with multi-dimensional issues and features such as environmental pressure, total water management, total waste management, noise and level of air pollution (Eusuf et al., 2014). A significant amount of data is needed to understand all these dimensions. Such knowledge is necessary to realize the vision of a smart city, which involves the use of data-driven approaches to improve the quality of life of the inhabitants and city infrastructures (Degbelo et al., 2016).

Technologies such as Radio-Frequency Identification (RFID) or the Internet of Things (IoT) are producing a large volume of data. Koh et al. (2015) pointed out that approximately 2.5 quintillion bytes of data are generated every day, and 90 percent of the data in the world has been created in the past two years alone. Managing this large amount of data, and analyzing it efficiently can help making more informed decisions while solving many of the societal challenges (e.g., exposure analysis, disaster preparedness, climate change). As discussed in Goodchild (2016), the attractiveness of big data can be summarized in one word, namely spatial prediction – the prediction of both the where and when.

This article focuses on the 5Vs of big data (volume, velocity, variety, value, veracity). The challenges associated with big data in the context of environmental monitoring at a city level are briefly presented in Section 2. Section 3 discusses the use of statistical methods like Land Use Regression (LUR) and Spatial Simulated Annealing (SSA) as two promising ways of addressing the challenges of big data….(More)”.

Big data and food retail: Nudging out citizens by creating dependent consumers


Michael Carolan at GeoForum: “The paper takes a critical look at how food retail firms use big data, looking specifically at how these techniques and technologies govern our ability to imagine food worlds. It does this by drawing on two sets of data: (1) interviews with twenty-one individuals who oversaw the use of big data applications in a retail setting and (2) five consumer focus groups composed of individuals who regularly shopped at major food chains along Colorado’s Front Range.

For reasons described below, the “nudge” provides the conceptual entry point for this analysis, as these techniques are typically expressed through big data-driven nudges. The argument begins by describing the nudge concept and how it is used in the context of retail big data. This is followed by a discussion of methods.

The remainder of the paper discusses how big data are used to nudge consumers and the effects of these practices. This analysis is organized around three themes that emerged out of the qualitative data: path dependency, products; path dependency, retail; and path dependency, habitus. The paper concludes connecting these themes through the concept of governance, particularly by way of their ability to, in Foucault’s (2003: 241) words, have “the power to ‘make’ live and ‘let’ die” worlds….(More)”.

The future of statistics and data science


Paper by Sofia C. Olhede and Patrick J. Wolfe in Statistics & Probability Letters: “The Danish physicist Niels Bohr is said to have remarked: “Prediction is very difficult, especially about the future”. Predicting the future of statistics in the era of big data is not so very different from prediction about anything else. Ever since we started to collect data to predict cycles of the moon, seasons, and hence future agriculture yields, humankind has worked to infer information from indirect observations for the purpose of making predictions.

Even while acknowledging the momentous difficulty in making predictions about the future, a few topics stand out clearly as lying at the current and future intersection of statistics and data science. Not all of these topics are of a strictly technical nature, but all have technical repercussions for our field. How might these repercussions shape the still relatively young field of statistics? And what can sound statistical theory and methods bring to our understanding of the foundations of data science? In this article we discuss these issues and explore how new open questions motivated by data science may in turn necessitate new statistical theory and methods now and in the future.

Together, the ubiquity of sensing devices, the low cost of data storage, and the commoditization of computing have led to a volume and variety of modern data sets that would have been unthinkable even a decade ago. We see four important implications for statistics.

First, many modern data sets are related in some way to human behavior. Data might have been collected by interacting with human beings, or personal or private information traceable back to a given set of individuals might have been handled at some stage. Mathematical or theoretical statistics traditionally does not concern itself with the finer points of human behavior, and indeed many of us have only had limited training in the rules and regulations that pertain to data derived from human subjects. Yet inevitably in a data-rich world, our technical developments cannot be divorced from the types of data sets we can collect and analyze, and how we can handle and store them.

Second, the importance of data to our economies and civil societies means that the future of regulation will look not only to protect our privacy, and how we store information about ourselves, but also to include what we are allowed to do with that data. For example, as we collect high-dimensional vectors about many family units across time and space in a given region or country, privacy will be limited by that high-dimensional space, but our wish to control what we do with data will go beyond that….

Third, the growing complexity of algorithms is matched by an increasing variety and complexity of data. Data sets now come in a variety of forms that can be highly unstructured, including images, text, sound, and various other new forms. These different types of observations have to be understood together, resulting in multimodal data, in which a single phenomenon or event is observed through different types of measurement devices. Rather than having one phenomenon corresponding to single scalar values, a much more complex object is typically recorded. This could be a three-dimensional shape, for example in medical imaging, or multiple types of recordings such as functional magnetic resonance imaging and simultaneous electroencephalography in neuroscience. Data science therefore challenges us to describe these more complex structures, modeling them in terms of their intrinsic patterns.

Finally, the types of data sets we now face are far from satisfying the classical statistical assumptions of identically distributed and independent observations. Observations are often “found” or repurposed from other sampling mechanisms, rather than necessarily resulting from designed experiments….

 Our field will either meet these challenges and become increasingly ubiquitous, or risk rapidly becoming irrelevant to the future of data science and artificial intelligence….(More)”.

Small Data for Big Impact


Liz Luckett at the Stanford Social Innovation Review: “As an investor in data-driven companies, I’ve been thinking a lot about my grandfather—a baker, a small business owner, and, I now realize, a pioneering data scientist. Without much more than pencil, paper, and extraordinarily deep knowledge of his customers in Washington Heights, Manhattan, he bought, sold, and managed inventory while also managing risk. His community was poor, but his business prospered. This was not because of what we celebrate today as the power and predictive promise of big data, but rather because of what I call small data: nuanced market insights that come through regular and trusted interactions.

Big data takes into account volumes of information from largely electronic sources—such as credit cards, pay stubs, test scores—and segments people into groups. As a result, people participating in the formalized economy benefit from big data. But people who are paid in cash and have no recognized accolades, such as higher education, are left out. Small data captures those insights to address this market failure. My grandfather, for example, had critical customer information he carefully gathered over the years: who could pay now, who needed a few days more, and which tabs to close. If he had access to a big data algorithm, it likely would have told him all his clients were unlikely to repay him, based on the fact that they were low income (vs. high income) and low education level (vs. college degree). Today, I worry that in our enthusiasm for big data and aggregated predictions, we often lose the critical insights we can gain from small data, because we don’t collect it. In the process, we are missing vital opportunities to both make money and create economic empowerment.

We won’t solve this problem of big data by returning to my grandfather’s shop floor. What we need is more and better data—a small data movement to supply vital missing links in marketplaces and supply chains the world over. What are the proxies that allow large companies to discern whom among the low income are good customers in the absence of a shopkeeper? At The Social Entrepreneurs’ Fund (TSEF), we are profitably investing in a new breed of data company: enterprises that are intentionally and responsibly serving low-income communities, and generating new and unique insights about the behavior of individuals in the process. The value of the small data they collect is becoming increasingly useful to other partners, including corporations who are willing to pay for it. It is a kind of dual market opportunity that for the first time makes it economically advantageous for these companies to reach the poor. We are betting on small data to transform opportunities and quality of life for the underserved, tap into markets that were once seen as too risky or too costly to reach, and earn significant returns for investors….(More)”.

World’s biggest city database shines light on our increasingly urbanised planet


EU Joint Research Centers: “The JRC has launched a new tool with data on all 10,000 urban centres scattered across the globe. It is the largest and most comprehensive database on cities ever published.

With data derived from the JRC’s Global Human Settlement Layer (GHSL), researchers have discovered that the world has become even more urbanised than previously thought.

Populations in urban areas doubled in Africa and grew by 1.1 billion in Asia between 1990 and 2015.

Globally, more than 400 cities have a population between 1 and 5 million. More than 40 cities have 5 to 10 million people, and there are 32 ‘megacities’ with above 10 million inhabitants.

There are some promising signs for the environment: Cities became 25% greener between 2000 and 2015. And although air pollution in urban centres was increasing from 1990, between 2000 and 2015 the trend was reversed.

With every high density area of at least 50,000 inhabitants covered, the city centres database shows growth in population and built-up areas over the past 40 years.  Environmental factors tracked include:

  • ‘Greenness’: the estimated amount of healthy vegetation in the city centre
  • Soil sealing: the covering of the soil surface with materials like concrete and stone, as a result of new buildings, roads and other public and private spaces
  • Air pollution: the level of polluting particles such as PM2.5 in the air
  • Vicinity to protected areas: the percentage of natural protected space within 30 km distance from the city centre’s border
  • Disaster risk-related exposure of population and buildings in low lying areas and on steep slopes.

The data is free to access and open to everyone. It applies big data analytics and a global, people-based definition of cities, providing support to monitor global urbanisation and the 2030 Sustainable Development Agenda.

The information gained from the GHSL is used to map out population density and settlement maps. Satellite, census and local geographic information are used to create the maps….(More)”.