China’s Hinterland Becomes A Critical Datascape


Article by Gary Zhexi Zhang: “In 2014, the southwestern province of Guizhou, a historically poor and mountainous area, beat out rival regions to become China’s first “Big Data Comprehensive Pilot Zone,” as part of a national directive to develop the region — which is otherwise best known as an exporter of tobacco, spirits and coal — into the infrastructural backbone of the country’s data industry. Since then, vast investment has poured into the province. Thousands of miles of highway and high-speed rail tunnel through the mountains. Driving through the province can feel vertiginous: Of the hundred highest bridges in the world, almost half are in Guizhou, and almost all were built in the last 15 years.

In 2015, Xi Jinping visited Gui’an New Area to inaugurate the province’s transformation into China’s “Big Data Valley,” exemplifying the central government’s goal to establish “high quality social and economic development,” ubiquitously advertised through socialist-style slogans plastered on highways and city streets…(More)”.

The Imperial Origins of Big Data


Blog and book by Asheesh Kapur Siddique: “We live in a moment of massive transformation in the nature of information. In 2020, according to one report, users of the Internet created 64.2 zetabytes of data, a quantity greater than the “number of detectable stars in the cosmos,” a colossal increase whose origins can be traced to the emergence of the World Wide Web in 1993.1 Facilitated by technologies like satellites, smartphones, and artificial intelligence, the scale and speed of data creation seems like it may only balloon over the rest of our lifetimes—and with it, the problem of how to govern ourselves in relation to the inequalities and opportunities that the explosion of data creates.

But while much about our era of big data is indeed revolutionary, the political questions that it raises—How should information be used? Who should control it? And how should it be preserved?—are ones with which societies have long grappled. These questions attained a particular importance in Europe from the eleventh century due to a technological change no less significant than the ones we are witnessing today: the introduction of paper into Europe. Initially invented in China, paper travelled to Europe via the conduit of Islam around the eleventh century after the Moors conquered Spain. Over the twelfth, thirteenth, and fourteenth centuries, paper emerged as the fundamental substrate which politicians, merchants, and scholars relied on to record and circulate information in governance, commerce, and learning. At the same time, governing institutions sought to preserve and control the spread of written information through the creation of archives: repositories where they collected, organized, and stored documents.

The expansion of European polities overseas from the late fifteenth century onward saw governments massively scale up their use of paper—and confront the challenge of controlling its dissemination across thousands of miles of ocean and land. These pressures were felt particularly acutely in what eventually became the largest empire in world history, the British empire. As people from the British isles from the early seventeenth century fought, traded, and settled their way to power in the Atlantic world and South Asia, administrators faced the problem of how to govern both their emigrating subjects and the non-British peoples with whom they interacted. This meant collecting information about their behavior through the technology of paper. Just as we struggle to organize, search, and control our email boxes, text messages, and app notifications, so too did these early moderns confront the attendant challenges of developing practices of collection and storage to manage the resulting information overload. And despite the best efforts of states and companies to control information, it constantly escaped their grasp, falling into the hands of their opponents and rivals who deployed it to challenge and contest ruling powers.

The history of the early modern information state offers no simple or straightforward answers to the questions that data raises for us today. But it does remind us of a crucial truth, all too readily obscured by the deluge of popular narratives glorifying technological innovation: that questions of data are inherently questions about politics—about who gets to collect, control, and use information, and the ends to which information should be put. We should resist any effort to insulate data governance from democratic processes—and having an informed perspective on the politics of data requires that we attend not just to its present, but also to its past…(More)”.

Big data for everyone


Article by Henrietta Howells: “Raw neuroimaging data require further processing before they can be used for scientific or clinical research. Traditionally, this could be accomplished with a single powerful computer. However, much greater computing power is required to analyze the large open-access cohorts that are increasingly being released to the community. And processing pipelines are inconsistently scripted, which can hinder reproducibility efforts. This creates a barrier for labs lacking access to sufficient resources or technological support, potentially excluding them from neuroimaging research. A paper by Hayashi and colleagues in Nature Methods offers a solution. They present https://brainlife.io, a freely available, web-based platform for secure neuroimaging data access, processing, visualization and analysis. It leverages ‘opportunistic computing’, which pools processing power from commercial and academic clouds, making it accessible to scientists worldwide. This is a step towards lowering the barriers for entry into big data neuroimaging research…(More)”.

Global Digital Data Governance: Polycentric Perspectives


(Open Access) Book edited by Carolina Aguerre, Malcolm Campbell-Verduyn, and Jan Aart Scholte: “This book provides a nuanced exploration of contemporary digital data governance, highlighting the importance of cooperation across sectors and disciplines in order to adapt to a rapidly evolving technological landscape. Most of the theory around global digital data governance remains scattered and focused on specific actors, norms, processes, or disciplinary approaches. This book argues for a polycentric approach, allowing readers to consider the issue across multiple disciplines and scales.

Polycentrism, this book argues, provides a set of lenses that tie together the variety of actors, issues, and processes intertwined in digital data governance at subnational, national, regional, and global levels. Firstly, this approach uncovers the complex array of power centers and connections in digital data governance. Secondly, polycentric perspectives bridge disciplinary divides, challenging assumptions and drawing together a growing range of insights about the complexities of digital data governance. Bringing together a wide range of case studies, this book draws out key insights and policy recommendations for how digital data governance occurs and how it might occur differently…(More)”.

Google’s Expanded ‘Flood Hub’ Uses AI to Help Us Adapt to Extreme Weather


Article by Jeff Young: “Google announced Tuesday that a tool using artificial intelligence to better predict river floods will be expanded to the U.S. and Canada, covering more than 800 North American riverside communities that are home to more than 12 million people. Google calls it Flood Hub, and it’s the latest example of how AI is being used to help adapt to extreme weather events associated with climate change.

“We see tremendous opportunity for AI to solve some of the world’s biggest challenges, and climate change is very much one of those,” Google’s Chief Sustainability Officer, Kate Brandt, told Newsweek in an interview.

At an event in Brussels on Tuesday, Google announced a suite of new and expanded sustainability initiatives and products. Many of them involve the use of AI, such as tools to help city planners find the best places to plant trees and modify rooftops to buffer against city heat, and a partnership with the U.S. Forest Service to use AI to improve maps related to wildfires.

Google Flood Hub Model AI extreme weather
A diagram showing the development of models used in Google’s Flood Hub, now available for 800 riverside locations in the U.S. and Canada. Courtesy of Google Research…

Brandt said Flood Hub’s engineers use advanced AI, publicly available data sources and satellite imagery, combined with hydrologic models of river flows. The results allow flooding predictions with a longer lead time than was previously available in many instances…(More)”.

The Age of Prediction: Algorithms, AI, and the Shifting Shadows of Risk


Book by Igor Tulchinsky and Christopher E. Mason: “… about two powerful, and symbiotic, trends: the rapid development and use of artificial intelligence and big data to enhance prediction, as well as the often paradoxical effects of these better predictions on our understanding of risk and the ways we live. Beginning with dramatic advances in quantitative investing and precision medicine, this book explores how predictive technology is quietly reshaping our world in fundamental ways, from crime fighting and warfare to monitoring individual health and elections.

As prediction grows more robust, it also alters the nature of the accompanying risk, setting up unintended and unexpected consequences. The Age of Prediction details how predictive certainties can bring about complacency or even an increase in risks—genomic analysis might lead to unhealthier lifestyles or a GPS might encourage less attentive driving. With greater predictability also comes a degree of mystery, and the authors ask how narrower risks might affect markets, insurance, or risk tolerance generally. Can we ever reduce risk to zero? Should we even try? This book lays an intriguing groundwork for answering these fundamental questions and maps out the latest tools and technologies that power these projections into the future, sometimes using novel, cross-disciplinary tools to map out cancer growth, people’s medical risks, and stock dynamics…(More)”.

Ethical Considerations Towards Protestware


Paper by Marc Cheong, Raula Gaikovina Kula, and Christoph Treude: “A key drawback to using a Open Source third-party library is the risk of introducing malicious attacks. In recently times, these threats have taken a new form, when maintainers turn their Open Source libraries into protestware. This is defined as software containing political messages delivered through these libraries, which can either be malicious or benign. Since developers are willing to freely open-up their software to these libraries, much trust and responsibility are placed on the maintainers to ensure that the library does what it promises to do. This paper takes a look into the possible scenarios where developers might consider turning their Open Source Software into protestware, using an ethico-philosophical lens. Using different frameworks commonly used in AI ethics, we explore the different dilemmas that may result in protestware. Additionally, we illustrate how an open-source maintainer’s decision to protest is influenced by different stakeholders (viz., their membership in the OSS community, their personal views, financial motivations, social status, and moral viewpoints), making protestware a multifaceted and intricate matter…(More)”

From LogFrames to Logarithms – A Travel Log


Article by Karl Steinacker and Michael Kubach: “..Today, authorities all over the world are experimenting with predictive algorithms. That sounds technical and innocent but as we dive deeper into the issue, we realise that the real meaning is rather specific: fraud detection systems in social welfare payment systems. In the meantime, the hitherto banned terminology had it’s come back: welfare or social safety nets are, since a couple of years, en vogue again. But in the centuries-old Western tradition, welfare recipients must be monitored and, if necessary, sanctioned, while those who work and contribute must be assured that there is no waste. So it comes at no surprise that even today’s algorithms focus on the prime suspect, the individual fraudster, the undeserving poor.

Fraud detection systems promise that the taxpayer will no longer fall victim to fraud and efficiency gains can be re-directed to serve more people. The true extent of welfare fraud is regularly exaggerated  while the costs of such systems is routinely underestimated. A comparison of the estimated losses and investments doesn’t take place. It is the principle to detect and punish the fraudsters that prevail. Other issues don’t rank high either, for example on how to distinguish between honest mistakes and deliberate fraud. And as case workers spent more time entering and analysing data and in front of a computer screen, the less they have time and inclination to talk to real people and to understand the context of their life at the margins of society.

Thus, it can be said that routinely hundreds of thousands of people are being scored. Example Denmark: Here, a system called Udbetaling Danmark was created in 2012 to streamline the payment of welfare benefits. Its fraud control algorithms can access the personal data of millions of citizens, not all of whom receive welfare payments. In contrast to the hundreds of thousands affected by this data mining, the number of cases referred to the Police for further investigation are minute. 

In the city of Rotterdam in the Netherlands every year, data of 30,000 welfare recipients is investigated in order to flag suspected welfare cheats. However, an analysis of its scoring system based on machine learning and algorithms showed systemic discrimination with regard to ethnicity, age, gender, and parenthood. It revealed evidence of other fundamental flaws making the system both inaccurate and unfair. What might appear to a caseworker as a vulnerability is treated by the machine as grounds for suspicion. Despite the scale of data used to calculate risk scores, the output of the system is not better than random guesses. However, the consequences of being flagged by the “suspicion machine” can be drastic, with fraud controllers empowered to turn the lives of suspects inside out.

As reported by the World Bank, the recent Covid-19 pandemic provided a great push to implement digital social welfare systems in the global South. In fact, for the World Bank the so-called Digital Public Infrastructure (DPI), enabling “Digitizing Government to Person Payments (G2Px)”, are as fundamental for social and economic development today as physical infrastructure was for previous generations. Hence, the World Bank is finances globally systems modelled after the Indian Aadhaar system, where more than a billion persons have been registered biometrically. Aadhaar has become, for all intents and purposes, a pre-condition to receive subsidised food and other assistance for 800 million Indian citizens.

Important international aid organisations are not behaving differently from states. The World Food Programme alone holds data of more than 40 million people on its Scope data base. Unfortunately, WFP like other UN organisations, is not subject to data protection laws and the jurisdiction of courts. This makes the communities they have worked with particularly vulnerable.

In most places, the social will become the metric, where logarithms determine the operational conduit for delivering, controlling and withholding assistance, especially welfare payments. In other places, the power of logarithms may go even further, as part of trust systems, creditworthiness, and social credit. These social credit systems for individuals are highly controversial as they require mass surveillance since they aim to track behaviour beyond financial solvency. The social credit score of a citizen might not only suffer from incomplete, or inaccurate data, but also from assessing political loyalties and conformist social behaviour…(More)”.

Big data for whom? Data-driven estimates to prioritize the recovery needs of vulnerable populations after a disaster


Blog and paper by Sabine Loos and David Lallemant: “For years, international agencies have been effusing the benefits of big data for sustainable development. Emerging technology–such as crowdsourcing, satellite imagery, and machine learning–have the power to better inform decision-making, especially those that support the 17 Sustainable Development Goals. When a disaster occurs, overwhelming amounts of big data from emerging technology are produced with the intention to support disaster responders. We are seeing this now with the recent earthquakes in Turkey and Syria: space agencies are processing satellite imagery to map faults and building damage or digital humanitarians are crowdsourcing baseline data like roads and buildings.

Eight years ago, the Nepal 2015 earthquake was no exception–emergency managers received maps of shaking or crowdsourced maps of affected people’s needs from diverse sources. A year later, I began research with a team of folks involved during the response to the earthquake, and I was determined to understand how big data produced after disasters were connected to the long-term effects of the earthquake. Our research team found that a lot of data that was used to guide the recovery focused on building damage, which was often viewed as a proxy for population needs. While building damage information is useful, it does not capture the full array of social, environmental, and physical factors that will lead to disparities in long-term recovery. I assumed information would have been available immediately after the earthquake that was aimed at supporting vulnerable populations. However, as I spent time in Nepal during the years after the 2015 earthquake, speaking with government officials and nongovernmental organizations involved in the response and recovery, I found they lacked key information about the needs of the most vulnerable households–those who would face the greatest obstacles during the recovery from the earthquake. While governmental and nongovernmental actors prioritized the needs of vulnerable households as best as possible with the information available, I was inspired to pursue research that could provide better information more quickly after an earthquake, to inform recovery efforts.

In our paper published in Communications Earth and Environment [link], we develop a data-driven approach to rapidly estimate which areas are likely to fall behind during recovery due to physical, environmental, and social obstacles. This approach leverages survey data on recovery progress combined with geospatial datasets that would be readily available after an event that represent factors expected to impede recovery. To identify communities with disproportionate needs long after a disaster, we propose focusing on those who fall behind in recovery over time, or non-recovery. We focus on non-recovery since it places attention on those who do not recover rather than delineating the characteristics of successful recovery. In addition, in speaking to several groups in Nepal involved in the recovery, they understood vulnerability–a concept that is place-based and can change over time–as those who would not be able to recover due to the earthquake…(More)”

Big Data and Public Policy


Book by Rebecca Moody and Victor Bekkers: “This book provides a comprehensive overview of how the course, content and outcome of policy making is affected by big data. It scrutinises the notion that big and open data makes policymaking a more rational process, in which policy makers are able to predict, assess and evaluate societal problems. It also examines how policy makers deal with big data, the problems and limitations they face, and how big data shapes policymaking on the ground. The book considers big data from various perspectives, not just the political, but also the technological, legal, institutional and ethical dimensions. The potential of big data use in the public sector is also assessed, as well as the risks and dangers this might pose. Through several extended case studies, it demonstrates the dynamics of big data and public policy. Offering a holistic approach to the study of big data, this book will appeal to students and scholars of public policy, public administration and data science, as well as those interested in governance and politics…(More)”.