Chasing Shadows: Cyber Espionage, Subversion, and the Global Fight for Democracy


Book by Ronald Deibert: “In this real-life spy thriller, cyber security expert Ronald Deibert details the unseemly marketplace for high-tech surveillance, professional disinformation, and computerized malfeasance. He reveals how his team of digital sleuths at the Citizen Lab have lifted the lid on dozens of covert operations targeting innocent citizens everywhere.

Chasing Shadows provides a front-row seat to a dark underworld of digital espionage, disinformation, and subversion. There, autocrats and dictators peer into their targets’ lives with the mere press of a button, spreading their tentacles of authoritarianism through a digital ecosystem that is insecure, poorly regulated, and prone to abuse. The activists, opposition figures, and journalists who dare to advocate for basic political rights and freedoms are hounded, arrested, tortured, and sometimes murdered.

From the gritty streets of Guatemala City to the corridors of power in the White House, this compelling narrative traces the journey of the Citizen Lab as it evolved into a globally renowned source of counterintelligence for civil society. As this small team of investigators disarmed cyber mercenaries and helped to improve the digital security of billions of people worldwide, their success brought them, too, into the same sinister crosshairs that plagued the victims they worked to protect.

Deibert recounts how the Lab exposed the world’s pre-eminent cyber-mercenary firm, Israel-based NSO Group—the creators of the phone-hacking marvel Pegasus—in a series of human rights abuses, from domestic spying scandals in Spain, Poland, Hungary, and Greece to its implication in the murder of Washington Post journalist Jamal Khashoggi in 2018…(More)”

Making the Global Digital Compact a reality: Four steps to establish a responsible, inclusive and equitable data future.


Article by Stefaan Verhulst: “In September of this year, as world leaders assemble in New York for the 78th annual meeting of the United Nations (UN) General Assembly, they will confront a weighty agenda. War and peace will be at the forefront of conversations, along with efforts to tackle climate change and the ongoing migration crisis. Alongside these usual topics, however, the gathered dignitaries will also turn their attention to digital governance.

In 2021, the UN Secretary General proposed that a Global Digital Compact (GDC) be agreed upon that would “outline shared principles for an open, free and secure digital future for all”. The development of this Compact, which builds on a range of adjacent work streams at the UN, including activities related to the Sustainable Development Goals (SDGs), has now reached a vital inflection point. After a wide-ranging process of consultation, the General Assembly is expected to ratify the latest draft of the Digital Compact, which contains five key objectives and a commitment to thirteen cross-cutting principles. We have reached a rare moment of near-consensus in the global digital ecosystem, one that offers undeniable potential for revamping (and improving) our frameworks for global governance.

The Global Digital Compact will be agreed upon by UN Member States at the Summit of the Future at the United Nations Headquarters in New York, establishing guidelines for the responsible use and governance of digital technologies. 

The growing prominence of these objectives and principles at the seat of global governance is a welcome development. Each is essential to developing a healthy, safe and responsible digital ecosystem. In particular, the emphasis on better data governance is a step forward, as is the related call for an enhanced approach for international AI governance. Both cannot be separated: data governance is the bedrock of AI governance.

Yet now that we are moving toward ratification of the Compact, we must focus on the next crucial—and in some ways most difficult – step: implementation. This is particularly important given that the digital realm faces in many ways a growing crisis of credibility, marked by growing concerns over exclusion, extraction, concentrations of power, mis- and disinformation, and what we have elsewhere referred to as an impending “data winter”.

Manifesting the goals of the Compact to create genuine and lasting impact is thus critical. In what follows, we explore four key ways in which the Compact’s key objectives can be operationalized to create a more vibrant, responsive and free global digital commons…(More)”.

We finally have a definition for open-source AI


Article by Rhiannon Williams and James O’Donnell: “Open-source AI is everywhere right now. The problem is, no one agrees on what it actually is. Now we may finally have an answer. The Open Source Initiative (OSI), the self-appointed arbiters of what it means to be open source, has released a new definition, which it hopes will help lawmakers develop regulations to protect consumers from AI risks. 

Though OSI has published much about what constitutes open-source technology in other fields, this marks its first attempt to define the term for AI models. It asked a 70-person group of researchers, lawyers, policymakers, and activists, as well as representatives from big tech companies like Meta, Google, and Amazon, to come up with the working definition. 

According to the group, an open-source AI system can be used for any purpose without the need to secure permission, and researchers should be able to inspect its components and study how the system works.

It should also be possible to modify the system for any purpose—including to change its output—and to share it with others to usewith or without modificationsfor any purpose. In addition, the standard attempts to define a level of transparency for a given model’s training data, source code, and weights. 

The previous lack of an open-source standard presented a problem…(More)”.

The Complexities of Differential Privacy for Survey Data


Paper by Jörg Drechsler & James Bailie: “The concept of differential privacy (DP) has gained substantial attention in recent years, most notably since the U.S. Census Bureau announced the adoption of the concept for its 2020 Decennial Census. However, despite its attractive theoretical properties, implementing DP in practice remains challenging, especially when it comes to survey data. In this paper we present some results from an ongoing project funded by the U.S. Census Bureau that is exploring the possibilities and limitations of DP for survey data. Specifically, we identify five aspects that need to be considered when adopting DP in the survey context: the multi-staged nature of data production; the limited privacy amplification from complex sampling designs; the implications of survey-weighted estimates; the weighting adjustments for nonresponse and other data deficiencies, and the imputation of missing values. We summarize the project’s key findings with respect to each of these aspects and also discuss some of the challenges that still need to be addressed before DP could become the new data protection standard at statistical agencies…(More)”.

Align or fail: How economics shape successful data sharing


Blog by Federico Bartolomucci: “…The conceptual distinctions between different data sharing models are mostly based on one fundamental element: the economic nature of data and its value. 

Open data projects operate under the assumption that data is a non-rival (i.e. can be used by multiple people at the same time) and a non-excludable asset (i.e. anyone can use it, similar to a public good like roads or the air we breathe). This means that data can be shared with everyone, for any use, without losing its market and competitive value. The Humanitarian Data Exchange platform is a great example that allows organizations to share over 19,000 open data sets on all aspects of humanitarian response with others.

Data collaboratives treat data as an excludable asset that some people may be excluded from accessing (i.e. a ‘club good’, like a movie theater) and therefore share it only among a restricted pool of actors. At the same time, they overcome the rival nature of this data set up by linking its use to a specific purpose. These work best by giving the actors a voice in choosing the purpose for which the data will be used, and through specific agreements and governance bodies that ensure that those contributing data will not have their competitive position harmed, therefore incentivizing them to engage. A good example of this is the California Data Collaborative, which uses data from different actors in the water sector to develop high-level analysis on water distribution to guide policy, planning, and operations for water districts in the state of California. 

Data ecosystems work by activating market mechanisms around data exchange to overcome reluctance to share data, rather than relying solely on its purpose of use. This means that actors can choose to share their data in exchange for compensation, be it monetary or in alternate forms such as other data. In this way, the compensation balances the potential loss of competitive advantage created by the sharing of a rival asset, as well as the costs and risks of sharing. The Enershare initiative aims to establish a marketplace utilizing blockchain and smart contracts to facilitate data exchange in the energy sector. The platform is based on a compensation system, which can be non-monetary, for exchanging assets and resources related to data (such as datasets, algorithms, and models) with energy assets and services (like heating system maintenance or the transfer of surplus locally self-produced energy).

These different models of data sharing have different operational implications…(More)”.

On Fables and Nuanced Charts


Column by Spencer Greenberg and Amber Dawn Ace: “In 1994, the U.S. Congress passed the largest crime bill in U.S. history, called the Violent Crime Control and Law Enforcement Act. The bill allocated billions of dollars to build more prisons and hire 100,000 new police officers, among other things. In the years following the bill’s passage, violent crime rates in the U.S. dropped drastically, from around 750 offenses per 100,000 people in 1990 to under 400 in 2018.

A chart showing U.S. crime rates over time. The data and annotation are real, but the implied story is not. Credit: Authors.

But can we infer, as this chart seems to ask us to, that the bill caused the drop in crime?

As it turns out, this chart wasn’t put together by sociologists or political scientists who’ve studied violent crime. Rather, we—a mathematician and a writer—devised it to make a point: Although charts seem to reflect reality, they often convey narratives that are misleading or entirely false.

Upon seeing that violent crime dipped after 1990, we looked up major events that happened right around that time—selecting one, the 1994 Crime Bill, and slapping it on the graph. There are other events we could have stuck on the graph just as easily that would likely have invited you to construct a completely different causal story. In other words, the bill and the data in the graph are real, but the story is manufactured.

Perhaps the 1994 Crime Bill really did cause the drop in violent crime, or perhaps the causality goes the other way: the spike in violent crime motivated politicians to pass the act in the first place. (Note that the act was passed slightly after the violent crime rate peaked!) 

Charts are a concise way not only to show data but also to tell a story. Such stories, however, reflect the interpretations of a chart’s creators and are often accepted by the viewer without skepticism. As Noah Smith and many others have argued, charts contain hidden assumptions that can drastically change the story they tell…(More)”.

Revisiting the ‘Research Parasite’ Debate in the Age of AI


Article by C. Brandon Ogbunu: “A 2016 editorial published in the New England Journal of Medicine lamented the existence of “research parasites,” those who pick over the data of others rather than generating new data themselves. The article touched on the ethics and appropriateness of this practice. The most charitable interpretation of the argument centered around the hard work and effort that goes into the generation of new data, which costs millions of research dollars and takes countless person-hours. Whatever the merits of that argument, the editorial and its associated arguments were widely criticized.

Given recent advances in AI, revisiting the research parasite debate offers a new perspective on the ethics of sharing and data democracy. It is ironic that the critics of research parasites might have made a sound argument — but for the wrong setting, aimed at the wrong target, at the wrong time. Specifically, the large language models, or LLMs, that underlie generative AI tools such as OpenAI’s ChatGPT, have an ethical challenge in how they parasitize freely available data. These discussions bring up new conversations about data security that may undermine, or at least complicate, efforts at openness and data democratization.

The backlash to that 2016 editorial was swift and violent. Many arguments centered around the anti-science spirit of the message. For example, metanalysis – which re-analyzes data from a selection of studies – is a critical practice that should be encouraged. Many groundbreaking discoveries about the natural world and human health have come from this practice, including new pictures of the molecular causes of depression and schizophrenia. Further, the central criticisms of research parasitism undermine the ethical goals of data sharing and ambitions for open science, where scientists and citizen-scientists can benefit from access to data. This differs from the status quo in 2016, when data published in many of the top journals of the world were locked behind a paywall, illegible, poorly labeled, or difficult to use. This remains largely true in 2024…(More)”.

Private sector trust in data sharing: enablers in the European Union


Paper by Jaime Bernal: “Enabling private sector trust stands as a critical policy challenge for the success of the EU Data Governance Act and Data Act in promoting data sharing to address societal challenges. This paper attributes the widespread trust deficit to the unmanageable uncertainty that arises from businesses’ limited usage control to protect their interests in the face of unacceptable perceived risks. For example, a firm may hesitate to share its data with others in case it is leaked and falls into the hands of business competitors. To illustrate this impasse, competition, privacy, and reputational risks are introduced, respectively, in the context of three suboptimal approaches to data sharing: data marketplaces, data collaboratives, and data philanthropy. The paper proceeds by analyzing seven trust-enabling mechanisms comprised of technological, legal, and organizational elements to balance trust, risk, and control and assessing their capacity to operate in a fair, equitable, and transparent manner. Finally, the paper examines the regulatory context in the EU and the advantages and limitations of voluntary and mandatory data sharing, concluding that an approach that effectively balances the two should be pursued…(More)”.

The Art of Uncertainty


Book by David Spiegelhalter: “We live in a world where uncertainty is inevitable. How should we deal with what we don’t know? And what role do chance, luck and coincidence play in our lives?

David Spiegelhalter has spent his career dissecting data in order to understand risks and assess the chances of what might happen in the future. In The Art of Uncertainty, he gives readers a window onto how we can all do this better.

In engaging, crystal-clear prose, he takes us through the principles of probability, showing how it can help us think more analytically about everything from medical advice to pandemics and climate change forecasts, and explores how we can update our beliefs about the future in the face of constantly changing experience. Along the way, he explains why roughly 40% of football results come down to luck rather than talent, how the National Risk Register assesses near-term risks to the United Kingdom, and why we can be so confident that two properly shuffled packs of cards have never, ever been in the exact same order.

Drawing on a wide range of captivating real-world examples, this is an essential guide to navigating uncertainty while also having the humility to admit what we do not know…(More)”.

Collaboration in Healthcare: Implications of Data Sharing for Secondary Use in the European Union


Paper by Fanni Kertesz: “The European healthcare sector is transforming toward patient-centred and value-based healthcare delivery. The European Health Data Space (EHDS) Regulation aims to unlock the potential of health data by establishing a single market for its primary and secondary use. This paper examines the legal challenges associated with the secondary use of health data within the EHDS and offers recommendations for improvement. Key issues include the compatibility between the EHDS and the General Data Protection Regulation (GDPR), barriers to cross-border data sharing, and intellectual property concerns. Resolving these challenges is essential for realising the full potential of health data and advancing healthcare research and innovation within the EU…(More)”.