COVID-19 digital contact tracing worked — heed the lessons for future pandemics


Article by Marcel Salathé: “During the first year of the COVID-19 pandemic, around 50 countries deployed digital contact tracing. When someone tested positive for SARS-CoV-2, anyone who had been in close proximity to that person (usually for 15 minutes or more) would be notified as long as both individuals had installed the contact-tracing app on their devices.

Digital contact tracing received much media attention, and much criticism, in that first year. Many worried that the technology provided a way for governments and technology companies to have even more control over people’s lives than they already do. Others dismissed the apps as a failure, after public-health authorities hit problems in deploying them.

Three years on, the data tell a different story.

The United Kingdom successfully integrated a digital contact-tracing app with other public-health programmes and interventions, and collected data to assess the app’s effectiveness. Several analyses now show that, even with the challenges of introducing a new technology during an emergency, and despite relatively low uptake, the app saved thousands of lives. It has also become clearer that many of the problems encountered elsewhere were not to do with the technology itself, but with integrating a twenty-first-century technology into what are largely twentieth-century public-health infrastructures…(More)”.

How Good Are Privacy Guarantees? Platform Architecture and Violation of User Privacy


Paper by Daron Acemoglu, Alireza Fallah, Ali Makhdoumi, Azarakhsh Malekian & Asuman Ozdaglar: “Many platforms deploy data collected from users for a multitude of purposes. While some are beneficial to users, others are costly to their privacy. The presence of these privacy costs means that platforms may need to provide guarantees about how and to what extent user data will be harvested for activities such as targeted ads, individualized pricing, and sales to third parties. In this paper, we build a multi-stage model in which users decide whether to share their data based on privacy guarantees. We first introduce a novel mask-shuffle mechanism and prove it is Pareto optimal—meaning that it leaks the least about the users’ data for any given leakage about the underlying common parameter. We then show that under any mask-shuffle mechanism, there exists a unique equilibrium in which privacy guarantees balance privacy costs and utility gains from the pooling of user data for purposes such as assessment of health risks or product development. Paradoxically, we show that as users’ value of pooled data increases, the equilibrium of the game leads to lower user welfare. This is because platforms take advantage of this change to reduce privacy guarantees so much that user utility declines (whereas it would have increased with a given mechanism). Even more strikingly, we show that platforms have incentives to choose data architectures that systematically differ from those that are optimal from the user’s point of view. In particular, we identify a class of pivot mechanisms, linking individual privacy to choices by others, which platforms prefer to implement and which make users significantly worse off…(More)”.

Non-traditional data sources in obesity research: a systematic review of their use in the study of obesogenic environments


Paper by Julia Mariel Wirtz Baker, Sonia Alejandra Pou, Camila Niclis, Eugenia Haluszka & Laura Rosana Aballay: “The field of obesity epidemiology has made extensive use of traditional data sources, such as health surveys and reports from official national statistical systems, whose variety of data can be at times limited to explore a wider range of determinants relevant to obesity. Over time, other data sources began to be incorporated into obesity research, such as geospatial data (web mapping platforms, satellite imagery, and other databases embedded in Geographic Information Systems), social network data (such as Twitter, Facebook, Instagram, or other social networks), digital device data and others. The data revolution, facilitated by the massive use of digital devices with hundreds of millions of users and the emergence of the “Internet of Things” (IoT), has generated huge volumes of data from everywhere: customers, social networks and sensors, in addition to all the traditional sources mentioned above. In the research area, it offers fruitful opportunities, contributing in ways that traditionally sourced research data could not.

An international expert panel in obesity and big data pointed out some key factors in the definition of Big Data, stating that “it is always digital, has a large sample size, and a large volume or variety or velocity of variables that require additional computing power, as well as specialist skills in computer programming, database management and data science analytics”. Our interpretation of non-traditional data sources is an approximation to this definition, assuming that they are sources not traditionally used in obesity epidemiology and environmental studies, which can include digital devices, social media and geospatial data within a GIS, the latter mainly based on complex indexes that require advanced data analysis techniques and expertise.

Beyond the still discussed limitations, Big Data can be assumed as a great opportunity to improve the study of obesogenic environments, since it has been announced as a powerful resource that can provide new knowledge about human behaviour and social phenomena. Besides, it can contribute to the formulation and evaluation of policies and the development of interventions for obesity prevention. However, in this field of research, the suitability of these novel data sources is still a subject of considerable discussion, and their use has not been investigated from the obesogenic environment approach…(More)”.

Engaging Scientists to Prevent Harmful Exploitation of Advanced Data Analytics and Biological Data


Proceedings from the National Academies of Sciences: “Artificial intelligence (AI), facial recognition, and other advanced computational and statistical techniques are accelerating advancements in the life sciences and many other fields. However, these technologies and the scientific developments they enable also hold the potential for unintended harm and malicious exploitation. To examine these issues and to discuss practices for anticipating and preventing the misuse of advanced data analytics and biological data in a global context, the National Academies of Sciences, Engineering, and Medicine convened two virtual workshops on November 15, 2022, and February 9, 2023. The workshops engaged scientists from the United States, South Asia, and Southeast Asia through a series of presentations and scenario-based exercises to explore emerging applications and areas of research, their potential benefits, and the ethical issues and security risks that arise when AI applications are used in conjunction with biological data. This publication highlights the presentations and discussions of the workshops…(More)”.

How should a robot explore the Moon? A simple question shows the limits of current AI systems


Article by Sally Cripps, Edward Santow, Nicholas Davis, Alex Fischer and Hadi Mohasel Afshar: “..Ultimately, AI systems should help humans make better, more accurate decisions. Yet even the most impressive and flexible of today’s AI tools – such as the large language models behind the likes of ChatGPT – can have the opposite effect.

Why? They have two crucial weaknesses. They do not help decision-makers understand causation or uncertainty. And they create incentives to collect huge amounts of data and may encourage a lax attitude to privacy, legal and ethical questions and risks…

ChatGPT and other “foundation models” use an approach called deep learning to trawl through enormous datasets and identify associations between factors contained in that data, such as the patterns of language or links between images and descriptions. Consequently, they are great at interpolating – that is, predicting or filling in the gaps between known values.

Interpolation is not the same as creation. It does not generate knowledge, nor the insights necessary for decision-makers operating in complex environments.

However, these approaches require huge amounts of data. As a result, they encourage organisations to assemble enormous repositories of data – or trawl through existing datasets collected for other purposes. Dealing with “big data” brings considerable risks around security, privacy, legality and ethics.

In low-stakes situations, predictions based on “what the data suggest will happen” can be incredibly useful. But when the stakes are higher, there are two more questions we need to answer.

The first is about how the world works: “what is driving this outcome?” The second is about our knowledge of the world: “how confident are we about this?”…(More)”.

Data for the City of Tomorrow: Developing the Capabilities and Capacity to Guide Better Urban Futures


WEF Report: “This report is a comprehensive manual for municipal governments and their partners, city authorities, and advocates and agents of change. It invites them to address vexing and seemingly intractable problems of urban governance and to imagine future scenarios. There is little agreement on how different types of cities should aggregate, analyse and apply data to their immediate issues and strategic challenges. Yet the potential of data to help navigate cities through the unprecedented urban, climate and digital transitions ahead is very high and likely underestimated. This report offers a look at what data exists, and how cities can take the best steps to make the most of it. It provides a route into the urban data ecosystem and an overview of some of the ways to develop data policies and capabilities fit for the needs of the many different kinds of city contexts worldwide…(More)”.

How to Regulate AI? Start With the Data


Article by Susan Ariel Aaronson: “We live in an era of data dichotomy. On one hand, AI developers rely on large data sets to “train” their systems about the world and respond to user questions. These data troves have become increasingly valuable and visible. On the other hand, despite the import of data, U.S. policy makers don’t view data governance as a vehicle to regulate AI.  

U.S. policy makers should reconsider that perspective. As an example, the European Union, and more than 30 other countries, provide their citizens with a right not to be subject to automated decision making without explicit consent. Data governance is clearly an effective way to regulate AI.

Many AI developers treat data as an afterthought, but how AI firms collect and use data can tell you a lot about the quality of the AI services they produce. Firms and researchers struggle to collect, classify, and label data sets that are large enough to reflect the real world, but then don’t adequately clean (remove anomalies or problematic data) and check their data. Also, few AI developers and deployers divulge information about the data they use to train AI systems. As a result, we don’t know if the data that underlies many prominent AI systems is complete, consistent, or accurate. We also don’t know where that data comes from (its provenance). Without such information, users don’t know if they should trust the results they obtain from AI. 

The Washington Post set out to document this problem. It collaborated with the Allen Institute for AI to examine Google’s C4 data set, a widely used and large learning model built on data scraped by bots from 15 million websites. Google then filters the data, but it understandably can’t filter the entire data set.  

Hence, this data set provides sufficient training data, but it also presents major risks for those firms or researchers who rely on it. Web scraping is generally legal in most countries as long as the scraped data isn’t used to cause harm to society, a firm, or an individual. But the Post found that the data set contained swaths of data from sites that sell pirated or counterfeit data, which the Federal Trade Commission views as harmful. Moreover, to be legal, the scraped data should not include personal data obtained without user consent or proprietary data obtained without firm permission. Yet the Post found large amounts of personal data in the data sets as well as some 200 million instances of copyrighted data denoted with the copyright symbol.

Reliance on scraped data sets presents other risks. Without careful examination of the data sets, the firms relying on that data and their clients cannot know if it contains incomplete or inaccurate data, which in turn could lead to problems of bias, propaganda, and misinformation. But researchers cannot check data accuracy without information about data provenance. Consequently, the firms that rely on such unverified data are creating some of the AI risks regulators hope to avoid. 

It makes sense for Congress to start with data as it seeks to govern AI. There are several steps Congress could take…(More)”.

Data collaborations at a local scale: Lessons learnt in Rennes (2010–2021)


Paper by Simon Chignard and Marion Glatron: “Data sharing is a requisite for developing data-driven innovation and collaboration at the local scale. This paper aims to identify key lessons and recommendations for building trustworthy data governance at the local scale, including the public and private sectors. Our research is based on the experience gained in Rennes Metropole since 2010 and focuses on two thematic use cases: culture and energy. For each one, we analyzed how the power relations between actors and the local public authority shape the modalities of data sharing and exploitation. The paper will elaborate on challenges and opportunities at the local level, in perspective with the national and European frameworks…(More)”.

How to Stay Smart in a Smart World


Book by Gerd Gigerenzer: “From dating apps and self-driving cars to facial recognition and the justice system, the increasing presence of AI has been widely championed – but there are limitations and risks too. In this book Gigerenzer shows how humans are often the greatest source of uncertainty and when people are involved, unwavering trust in complex algorithms can become a recipe for disaster. We need, now more than ever, to arm ourselves with knowledge that will help us make better decisions in a digital age.

Filled with practical examples and cutting-edge research, How to Stay Smart in a Smart World examines the growing role of AI at all levels of daily life with refreshing clarity. This book is a life raft in a sea of information and an urgent invitation to actively shape the world in which we want to live…(More)”.