big data

Artificial Intelligence and Big Data

Curated on April 21, 2025April 21, 2025 by Stefaan Verhulst

Book edited by Frans L. Leeuw and Michael Bamberger: “…explores how Artificial Intelligence (AI) and Big Data contribute to the evaluation of the rule of law (covering legal arrangements, empirical legal research, law and technology, and international law), and social and economic development programs in both industrialized and developing countries. Issues of ethics and bias in the use of AI are also addressed and indicators of the growth of knowledge in the field are discussed.

Interdisciplinary and international in scope, and bringing together leading academics and practitioners from across the globe, the book explores the applications of AI and big data in Rule of Law and development evaluation, identifies differences in the approaches used in the two fields, and how each could learn from the approaches used in the other, as well as differences in the AI-related issues addressed in industrialized nations compared to those addressed in Africa and Asia.

Artificial Intelligence and Big Data is an essential read for researchers, academics and students working in the fields of Rule of Law and Development, and researchers in institutions working on new applications in AI will all benefit from the book’s practical insights…(More)”.

Web 3.0 Requires Data Integrity

Curated on March 24, 2025March 27, 2025 by Stefaan Verhulst

Article by Bruce Schneier and Davi Ottenheimer: “If you’ve ever taken a computer security class, you’ve probably learned about the three legs of computer security—confidentiality, integrity, and availability—known as the CIA triad.^a When we talk about a system being secure, that’s what we’re referring to. All are important, but to different degrees in different contexts. In a world populated by artificial intelligence (AI) systems and artificial intelligent agents, integrity will be paramount.

What is data integrity? It’s ensuring that no one can modify data—that’s the security angle—but it’s much more than that. It encompasses accuracy, completeness, and quality of data—all over both time and space. It’s preventing accidental data loss; the “undo” button is a primitive integrity measure. It’s also making sure that data is accurate when it’s collected—that it comes from a trustworthy source, that nothing important is missing, and that it doesn’t change as it moves from format to format. The ability to restart your computer is another integrity measure.

The CIA triad has evolved with the Internet. The first iteration of the Web—Web 1.0 of the 1990s and early 2000s—prioritized availability. This era saw organizations and individuals rush to digitize their content, creating what has become an unprecedented repository of human knowledge. Organizations worldwide established their digital presence, leading to massive digitization projects where quantity took precedence over quality. The emphasis on making information available overshadowed other concerns.

As Web technologies matured, the focus shifted to protecting the vast amounts of data flowing through online systems. This is Web 2.0: the Internet of today. Interactive features and user-generated content transformed the Web from a read-only medium to a participatory platform. The increase in personal data, and the emergence of interactive platforms for e-commerce, social media, and online everything demanded both data protection and user privacy. Confidentiality became paramount.

We stand at the threshold of a new Web paradigm: Web 3.0. This is a distributed, decentralized, intelligent Web. Peer-to-peer social-networking systems promise to break the tech monopolies’ control on how we interact with each other. Tim Berners-Lee’s open W3C protocol, Solid, represents a fundamental shift in how we think about data ownership and control. A future filled with AI agents requires verifiable, trustworthy personal data and computation. In this world, data integrity takes center stage…(More)”.

Will big data lift the veil of ignorance?

Curated on February 4, 2025February 4, 2025 by Stefaan Verhulst

Blog by Lisa Herzog: “Imagine that you have a toothache, and a visit at the dentist reveals that a major operation is needed. You phone your health insurance. You listen to the voice of the chatbot, press the buttons to go through the menu. And then you hear: “We have evaluated your profile based on the data you have agreed to share with us. Your dental health behavior scores 6 out of 10. The suggested treatment plan therefore requires a co-payment of [insert some large sum of money here].”

This may sound like science fiction. But many other insurances, e.g. car insurances, already build on automated data being shared with them. If they were allowed, health insurers would certainly like to access our data as well – not only those from smart toothbrushes, but also credit card data, behavioral data (e.g. from step counting apps), or genetic data. If they were allowed to use them, they could move towards segmented insurance plans for specific target groups. As two commentators, on whose research I come back below, recently wrote about health insurance: “Today, public plans and nondiscrimination clauses, not lack of information, are what stands between integration and segmentation.”

If, like me, you’re interested in the relation between knowledge and institutional design, insurance is a fascinating topic. The basic idea of insurance is centuries old – here is a brief summary (skip a few paragraphs if you know this stuff). Because we cannot know what might happen to us in the future, but we can know that on an aggregate level, things will happen to people, it can make sense to enter an insurance contract, creating a pool that a group jointly contributes to. Those for whom the risks in question materialize get support from the pool. Those for whom it does not materialize may go through life without receiving any money, but they still know that they could get support if something happened to them. As such, insurance combines solidarity within a group with individual pre-caution…(More)”.

China’s Hinterland Becomes A Critical Datascape

Curated on September 28, 2024October 3, 2024 by Stefaan Verhulst

Article by Gary Zhexi Zhang: “In 2014, the southwestern province of Guizhou, a historically poor and mountainous area, beat out rival regions to become China’s first “Big Data Comprehensive Pilot Zone,” as part of a national directive to develop the region — which is otherwise best known as an exporter of tobacco, spirits and coal — into the infrastructural backbone of the country’s data industry. Since then, vast investment has poured into the province. Thousands of miles of highway and high-speed rail tunnel through the mountains. Driving through the province can feel vertiginous: Of the hundred highest bridges in the world, almost half are in Guizhou, and almost all were built in the last 15 years.

In 2015, Xi Jinping visited Gui’an New Area to inaugurate the province’s transformation into China’s “Big Data Valley,” exemplifying the central government’s goal to establish “high quality social and economic development,” ubiquitously advertised through socialist-style slogans plastered on highways and city streets…(More)”.

The Imperial Origins of Big Data

Curated on August 29, 2024August 29, 2024 by Stefaan Verhulst

Blog and book by Asheesh Kapur Siddique: “We live in a moment of massive transformation in the nature of information. In 2020, according to one report, users of the Internet created 64.2 zetabytes of data, a quantity greater than the “number of detectable stars in the cosmos,” a colossal increase whose origins can be traced to the emergence of the World Wide Web in 1993.¹ Facilitated by technologies like satellites, smartphones, and artificial intelligence, the scale and speed of data creation seems like it may only balloon over the rest of our lifetimes—and with it, the problem of how to govern ourselves in relation to the inequalities and opportunities that the explosion of data creates.

But while much about our era of big data is indeed revolutionary, the political questions that it raises—How should information be used? Who should control it? And how should it be preserved?—are ones with which societies have long grappled. These questions attained a particular importance in Europe from the eleventh century due to a technological change no less significant than the ones we are witnessing today: the introduction of paper into Europe. Initially invented in China, paper travelled to Europe via the conduit of Islam around the eleventh century after the Moors conquered Spain. Over the twelfth, thirteenth, and fourteenth centuries, paper emerged as the fundamental substrate which politicians, merchants, and scholars relied on to record and circulate information in governance, commerce, and learning. At the same time, governing institutions sought to preserve and control the spread of written information through the creation of archives: repositories where they collected, organized, and stored documents.

The expansion of European polities overseas from the late fifteenth century onward saw governments massively scale up their use of paper—and confront the challenge of controlling its dissemination across thousands of miles of ocean and land. These pressures were felt particularly acutely in what eventually became the largest empire in world history, the British empire. As people from the British isles from the early seventeenth century fought, traded, and settled their way to power in the Atlantic world and South Asia, administrators faced the problem of how to govern both their emigrating subjects and the non-British peoples with whom they interacted. This meant collecting information about their behavior through the technology of paper. Just as we struggle to organize, search, and control our email boxes, text messages, and app notifications, so too did these early moderns confront the attendant challenges of developing practices of collection and storage to manage the resulting information overload. And despite the best efforts of states and companies to control information, it constantly escaped their grasp, falling into the hands of their opponents and rivals who deployed it to challenge and contest ruling powers.

The history of the early modern information state offers no simple or straightforward answers to the questions that data raises for us today. But it does remind us of a crucial truth, all too readily obscured by the deluge of popular narratives glorifying technological innovation: that questions of data are inherently questions about politics—about who gets to collect, control, and use information, and the ends to which information should be put. We should resist any effort to insulate data governance from democratic processes—and having an informed perspective on the politics of data requires that we attend not just to its present, but also to its past…(More)”.

Big data for everyone

Curated on May 11, 2024May 15, 2024 by Stefaan Verhulst

Article by Henrietta Howells: “Raw neuroimaging data require further processing before they can be used for scientific or clinical research. Traditionally, this could be accomplished with a single powerful computer. However, much greater computing power is required to analyze the large open-access cohorts that are increasingly being released to the community. And processing pipelines are inconsistently scripted, which can hinder reproducibility efforts. This creates a barrier for labs lacking access to sufficient resources or technological support, potentially excluding them from neuroimaging research. A paper by Hayashi and colleagues in Nature Methods offers a solution. They present https://brainlife.io, a freely available, web-based platform for secure neuroimaging data access, processing, visualization and analysis. It leverages ‘opportunistic computing’, which pools processing power from commercial and academic clouds, making it accessible to scientists worldwide. This is a step towards lowering the barriers for entry into big data neuroimaging research…(More)”.

Global Digital Data Governance: Polycentric Perspectives

Curated on January 4, 2024January 4, 2024 by Stefaan Verhulst

(Open Access) Book edited by Carolina Aguerre, Malcolm Campbell-Verduyn, and Jan Aart Scholte: “This book provides a nuanced exploration of contemporary digital data governance, highlighting the importance of cooperation across sectors and disciplines in order to adapt to a rapidly evolving technological landscape. Most of the theory around global digital data governance remains scattered and focused on specific actors, norms, processes, or disciplinary approaches. This book argues for a polycentric approach, allowing readers to consider the issue across multiple disciplines and scales.

Polycentrism, this book argues, provides a set of lenses that tie together the variety of actors, issues, and processes intertwined in digital data governance at subnational, national, regional, and global levels. Firstly, this approach uncovers the complex array of power centers and connections in digital data governance. Secondly, polycentric perspectives bridge disciplinary divides, challenging assumptions and drawing together a growing range of insights about the complexities of digital data governance. Bringing together a wide range of case studies, this book draws out key insights and policy recommendations for how digital data governance occurs and how it might occur differently…(More)”.

Google’s Expanded ‘Flood Hub’ Uses AI to Help Us Adapt to Extreme Weather

Curated on October 10, 2023October 12, 2023 by Stefaan Verhulst

Article by Jeff Young: “Google announced Tuesday that a tool using artificial intelligence to better predict river floods will be expanded to the U.S. and Canada, covering more than 800 North American riverside communities that are home to more than 12 million people. Google calls it Flood Hub, and it’s the latest example of how AI is being used to help adapt to extreme weather events associated with climate change.

“We see tremendous opportunity for AI to solve some of the world’s biggest challenges, and climate change is very much one of those,” Google’s Chief Sustainability Officer, Kate Brandt, told Newsweek in an interview.

At an event in Brussels on Tuesday, Google announced a suite of new and expanded sustainability initiatives and products. Many of them involve the use of AI, such as tools to help city planners find the best places to plant trees and modify rooftops to buffer against city heat, and a partnership with the U.S. Forest Service to use AI to improve maps related to wildfires.

Google Flood Hub Model AI extreme weather — A diagram showing the development of models used in Google’s Flood Hub, now available for 800 riverside locations in the U.S. and Canada. Courtesy of Google Research…

Brandt said Flood Hub’s engineers use advanced AI, publicly available data sources and satellite imagery, combined with hydrologic models of river flows. The results allow flooding predictions with a longer lead time than was previously available in many instances…(More)”.

The Age of Prediction: Algorithms, AI, and the Shifting Shadows of Risk

Curated on August 23, 2023August 23, 2023 by Stefaan Verhulst

Book by Igor Tulchinsky and Christopher E. Mason: “… about two powerful, and symbiotic, trends: the rapid development and use of artificial intelligence and big data to enhance prediction, as well as the often paradoxical effects of these better predictions on our understanding of risk and the ways we live. Beginning with dramatic advances in quantitative investing and precision medicine, this book explores how predictive technology is quietly reshaping our world in fundamental ways, from crime fighting and warfare to monitoring individual health and elections.

As prediction grows more robust, it also alters the nature of the accompanying risk, setting up unintended and unexpected consequences. The Age of Prediction details how predictive certainties can bring about complacency or even an increase in risks—genomic analysis might lead to unhealthier lifestyles or a GPS might encourage less attentive driving. With greater predictability also comes a degree of mystery, and the authors ask how narrower risks might affect markets, insurance, or risk tolerance generally. Can we ever reduce risk to zero? Should we even try? This book lays an intriguing groundwork for answering these fundamental questions and maps out the latest tools and technologies that power these projections into the future, sometimes using novel, cross-disciplinary tools to map out cancer growth, people’s medical risks, and stock dynamics…(More)”.

Ethical Considerations Towards Protestware

Curated on June 21, 2023June 21, 2023 by Stefaan Verhulst

Paper by Marc Cheong, Raula Gaikovina Kula, and Christoph Treude: “A key drawback to using a Open Source third-party library is the risk of introducing malicious attacks. In recently times, these threats have taken a new form, when maintainers turn their Open Source libraries into protestware. This is defined as software containing political messages delivered through these libraries, which can either be malicious or benign. Since developers are willing to freely open-up their software to these libraries, much trust and responsibility are placed on the maintainers to ensure that the library does what it promises to do. This paper takes a look into the possible scenarios where developers might consider turning their Open Source Software into protestware, using an ethico-philosophical lens. Using different frameworks commonly used in AI ethics, we explore the different dilemmas that may result in protestware. Additionally, we illustrate how an open-source maintainer’s decision to protest is influenced by different stakeholders (viz., their membership in the OSS community, their personal views, financial motivations, social status, and moral viewpoints), making protestware a multifaceted and intricate matter…(More)”