Attacks on Tax Privacy: How the Tax Prep Industry Enabled Meta to Harvest Millions of Taxpayers’ Sensitive Data


Congressional Report: “The investigation revealed that:

  • Tax preparation companies shared millions of taxpayers’ data with Meta, Google, and other Big Tech firms: The tax prep companies used computer code – known as pixels – to send data to Meta and Google. While most websites use pixels, it is particularly reckless for online tax preparation websites to use them on webpages where tax return information is entered unless further steps are taken to ensure that the pixels do not access sensitive information. TaxAct, TaxSlayer, and H&R Block confirmed that they had used the Meta Pixel, and had been using it “for at least a couple of years” and all three companies had been using Google Analytics (GA) for even longer.
  • Tax prep companies shared extraordinarily sensitive personal and financial information with Meta, which used the data for diverse advertising purposes: TaxAct, H&R Block, and TaxSlayer each revealed, in response to this Congressional inquiry, that they shared taxpayer data via their use of the Meta Pixel and Google’s tools. Although the tax prep companies and Big Tech firms claimed that all shared data was anonymous, the FTC and experts have indicated that the data could easily be used to identify individuals, or to create a dossier on them that could be used for targeted advertising or other purposes. 
  • Tax prep companies and Big Tech firms were reckless about their data sharing practices and their treatment of sensitive taxpayer data: The tax prep companies indicated that they installed the Meta and Google tools on their websites without fully understanding the extent to which they would send taxpayer data to these tech firms, without consulting with independent compliance or privacy experts, and without full knowledge of Meta’s use of and disposition of the data. 
  • Tax prep companies may have violated taxpayer privacy laws by sharing taxpayer data with Big Tech firms: Under the law, “a tax return preparer may not disclose or use a taxpayer’s tax return information prior to obtaining a written consent from the taxpayer,” – and they failed to do so when it came to the information that was turned over to Meta and Google. Tax prep companies can also turn over data to “auxiliary service providers in connection with the preparation of a tax return.” But Meta and Google likely do not meet the definition of “auxiliary service providers” and the data sharing with Meta was for advertising purposes – not “in connection with the preparation of a tax return.”…(More)”.

Combining Human Expertise with Artificial Intelligence: Experimental Evidence from Radiology


Paper by Nikhil Agarwal, Alex Moehring, Pranav Rajpurkar & Tobias Salz: “While Artificial Intelligence (AI) algorithms have achieved performance levels comparable to human experts on various predictive tasks, human experts can still access valuable contextual information not yet incorporated into AI predictions. Humans assisted by AI predictions could outperform both human-alone or AI-alone. We conduct an experiment with professional radiologists that varies the availability of AI assistance and contextual information to study the effectiveness of human-AI collaboration and to investigate how to optimize it. Our findings reveal that (i) providing AI predictions does not uniformly increase diagnostic quality, and (ii) providing contextual information does increase quality. Radiologists do not fully capitalize on the potential gains from AI assistance because of large deviations from the benchmark Bayesian model with correct belief updating. The observed errors in belief updating can be explained by radiologists’ partially underweighting the AI’s information relative to their own and not accounting for the correlation between their own information and AI predictions. In light of these biases, we design a collaborative system between radiologists and AI. Our results demonstrate that, unless the documented mistakes can be corrected, the optimal solution involves assigning cases either to humans or to AI, but rarely to a human assisted by AI…(More)”.

Weather Warning Inequity: Lack of Data Collection Stations Imperils Vulnerable People


Article by Chelsea Harvey: “Devastating floods and landslides triggered by extreme downpours killed hundreds of people in Rwanda and the Democratic Republic of Congo in May, when some areas saw more than 7 inches of rain in a day.

Climate change is intensifying rainstorms throughout much of the world, yet scientists haven’t been able to show that the event was influenced by warming.

That’s because they don’t have enough data to investigate it.

Weather stations are sparse across Africa, making it hard for researchers to collect daily information on rainfall and other weather variables. The data that does exist often isn’t publicly available.

“The main issue in some countries in Africa is funding,” said Izidine Pinto, a senior researcher on weather and climate at the Royal Netherlands Meteorological Institute. “The meteorological offices don’t have enough funding.”

There’s often too little money to build or maintain weather stations, and strapped-for-cash governments often choose to sell the data they do collect rather than make it free to researchers.

That’s a growing problem as the planet warms and extreme weather worsens. Reliable forecasts are needed for early warning systems that direct people to take shelter or evacuate before disasters strike. And long-term climate data is necessary for scientists to build computer models that help make predictions about the future.

The science consortium World Weather Attribution is the latest research group to run into problems. It investigates the links between climate change and individual extreme weather events all over the globe. In the last few months alone, the organization has demonstrated the influence of global warming on extreme heat in South Asia and the Mediterranean, floods in Italy, and drought in eastern Africa.

Most of its research finds that climate change is making weather events more likely to occur or more intense.

The group recently attempted to investigate the influence of climate change on the floods in Rwanda and Congo. But the study was quickly mired in challenges.

The team was able to acquire some weather station data, mainly in Rwanda, Joyce Kimutai, a research associate at Imperial College London and a co-author of the study, said at a press briefing announcing the findings Thursday. But only a few stations provided sufficient data, making it impossible to define the event or to be certain that climate model simulations were accurate…(More)”.

AI and the automation of work


Essay by Benedict Evans: “…We should start by remembering that we’ve been automating work for 200 years. Every time we go through a wave of automation, whole classes of jobs go away, but new classes of jobs get created. There is frictional pain and dislocation in that process, and sometimes the new jobs go to different people in different places, but over time the total number of jobs doesn’t go down, and we have all become more prosperous.

When this is happening to your own generation, it seems natural and intuitive to worry that this time, there aren’t going to be those new jobs. We can see the jobs that are going away, but we can’t predict what the new jobs will be, and often they don’t exist yet. We know (or should know), empirically, that there always have been those new jobs in the past, and that they weren’t predictable either: no-one in 1800 would have predicted that in 1900 a million Americans would work on ‘railways’ and no-one in 1900 would have predicted ‘video post-production’ or ‘software engineer’ as employment categories. But it seems insufficient to take it on faith that this will happen now just because it always has in the past. How do you know it will happen this time? Is this different?

At this point, any first-year economics student will tell us that this is answered by, amongst other things, the ‘Lump of Labour’ fallacy.

The Lump of Labour fallacy is the misconception that there is a fixed amount of work to be done, and that if some work is taken by a machine then there will be less work for people. But if it becomes cheaper to use a machine to make, say, a pair of shoes, then the shoes are cheaper, more people can buy shoes and they have more money to spend on other things besides, and we discover new things we need or want, and new jobs. The efficient gain isn’t confined to the shoe: generally, it ripples outward through the economy and creates new prosperity and new jobs. So, we don’t know what the new jobs will be, but we have a model that says, not just that there always have been new jobs, but why that is inherent in the process. Don’t worry about AI!The most fundamental challenge to this model today, I think, is to say that no, what’s really been happening for the last 200 years of automation is that we’ve been moving up the scale of human capability…(More)”.

Open data for AI: what now?


UNESCO Report: “…A vast amount of data on environment, industry, agriculture health about the world is now being collected through automatic processes, including sensors. Such data may be readily available, but also are potentially too big for humans to handle or analyse effectively, nonetheless they could serve as input to AI systems. AI and data science techniques have demonstrated great capacity to analyse large amounts of data, as currently illustrated by generative AI systems, and help uncover formerly unknown hidden patterns to deliver actionable information in real-time. However, many contemporary AI systems run on proprietary datasets, but data that fulfil the criteria of open data would benefit AI systems further and mitigate potential hazards of the systems such as lacking fairness, accountability, and transparency.

The aim of these guidelines is to apprise Member States of the value of open data, and to outline how data are curated and opened. Member States are encouraged not only to support openness of high-quality data, but also to embrace the use of AI technologies and facilitate capacity building, training and education in this regard, including inclusive open data as well as AI literacy…(More)”.

COVID-19 digital contact tracing worked — heed the lessons for future pandemics


Article by Marcel Salathé: “During the first year of the COVID-19 pandemic, around 50 countries deployed digital contact tracing. When someone tested positive for SARS-CoV-2, anyone who had been in close proximity to that person (usually for 15 minutes or more) would be notified as long as both individuals had installed the contact-tracing app on their devices.

Digital contact tracing received much media attention, and much criticism, in that first year. Many worried that the technology provided a way for governments and technology companies to have even more control over people’s lives than they already do. Others dismissed the apps as a failure, after public-health authorities hit problems in deploying them.

Three years on, the data tell a different story.

The United Kingdom successfully integrated a digital contact-tracing app with other public-health programmes and interventions, and collected data to assess the app’s effectiveness. Several analyses now show that, even with the challenges of introducing a new technology during an emergency, and despite relatively low uptake, the app saved thousands of lives. It has also become clearer that many of the problems encountered elsewhere were not to do with the technology itself, but with integrating a twenty-first-century technology into what are largely twentieth-century public-health infrastructures…(More)”.

How Good Are Privacy Guarantees? Platform Architecture and Violation of User Privacy


Paper by Daron Acemoglu, Alireza Fallah, Ali Makhdoumi, Azarakhsh Malekian & Asuman Ozdaglar: “Many platforms deploy data collected from users for a multitude of purposes. While some are beneficial to users, others are costly to their privacy. The presence of these privacy costs means that platforms may need to provide guarantees about how and to what extent user data will be harvested for activities such as targeted ads, individualized pricing, and sales to third parties. In this paper, we build a multi-stage model in which users decide whether to share their data based on privacy guarantees. We first introduce a novel mask-shuffle mechanism and prove it is Pareto optimal—meaning that it leaks the least about the users’ data for any given leakage about the underlying common parameter. We then show that under any mask-shuffle mechanism, there exists a unique equilibrium in which privacy guarantees balance privacy costs and utility gains from the pooling of user data for purposes such as assessment of health risks or product development. Paradoxically, we show that as users’ value of pooled data increases, the equilibrium of the game leads to lower user welfare. This is because platforms take advantage of this change to reduce privacy guarantees so much that user utility declines (whereas it would have increased with a given mechanism). Even more strikingly, we show that platforms have incentives to choose data architectures that systematically differ from those that are optimal from the user’s point of view. In particular, we identify a class of pivot mechanisms, linking individual privacy to choices by others, which platforms prefer to implement and which make users significantly worse off…(More)”.

Non-traditional data sources in obesity research: a systematic review of their use in the study of obesogenic environments


Paper by Julia Mariel Wirtz Baker, Sonia Alejandra Pou, Camila Niclis, Eugenia Haluszka & Laura Rosana Aballay: “The field of obesity epidemiology has made extensive use of traditional data sources, such as health surveys and reports from official national statistical systems, whose variety of data can be at times limited to explore a wider range of determinants relevant to obesity. Over time, other data sources began to be incorporated into obesity research, such as geospatial data (web mapping platforms, satellite imagery, and other databases embedded in Geographic Information Systems), social network data (such as Twitter, Facebook, Instagram, or other social networks), digital device data and others. The data revolution, facilitated by the massive use of digital devices with hundreds of millions of users and the emergence of the “Internet of Things” (IoT), has generated huge volumes of data from everywhere: customers, social networks and sensors, in addition to all the traditional sources mentioned above. In the research area, it offers fruitful opportunities, contributing in ways that traditionally sourced research data could not.

An international expert panel in obesity and big data pointed out some key factors in the definition of Big Data, stating that “it is always digital, has a large sample size, and a large volume or variety or velocity of variables that require additional computing power, as well as specialist skills in computer programming, database management and data science analytics”. Our interpretation of non-traditional data sources is an approximation to this definition, assuming that they are sources not traditionally used in obesity epidemiology and environmental studies, which can include digital devices, social media and geospatial data within a GIS, the latter mainly based on complex indexes that require advanced data analysis techniques and expertise.

Beyond the still discussed limitations, Big Data can be assumed as a great opportunity to improve the study of obesogenic environments, since it has been announced as a powerful resource that can provide new knowledge about human behaviour and social phenomena. Besides, it can contribute to the formulation and evaluation of policies and the development of interventions for obesity prevention. However, in this field of research, the suitability of these novel data sources is still a subject of considerable discussion, and their use has not been investigated from the obesogenic environment approach…(More)”.

Engaging Scientists to Prevent Harmful Exploitation of Advanced Data Analytics and Biological Data


Proceedings from the National Academies of Sciences: “Artificial intelligence (AI), facial recognition, and other advanced computational and statistical techniques are accelerating advancements in the life sciences and many other fields. However, these technologies and the scientific developments they enable also hold the potential for unintended harm and malicious exploitation. To examine these issues and to discuss practices for anticipating and preventing the misuse of advanced data analytics and biological data in a global context, the National Academies of Sciences, Engineering, and Medicine convened two virtual workshops on November 15, 2022, and February 9, 2023. The workshops engaged scientists from the United States, South Asia, and Southeast Asia through a series of presentations and scenario-based exercises to explore emerging applications and areas of research, their potential benefits, and the ethical issues and security risks that arise when AI applications are used in conjunction with biological data. This publication highlights the presentations and discussions of the workshops…(More)”.

How should a robot explore the Moon? A simple question shows the limits of current AI systems


Article by Sally Cripps, Edward Santow, Nicholas Davis, Alex Fischer and Hadi Mohasel Afshar: “..Ultimately, AI systems should help humans make better, more accurate decisions. Yet even the most impressive and flexible of today’s AI tools – such as the large language models behind the likes of ChatGPT – can have the opposite effect.

Why? They have two crucial weaknesses. They do not help decision-makers understand causation or uncertainty. And they create incentives to collect huge amounts of data and may encourage a lax attitude to privacy, legal and ethical questions and risks…

ChatGPT and other “foundation models” use an approach called deep learning to trawl through enormous datasets and identify associations between factors contained in that data, such as the patterns of language or links between images and descriptions. Consequently, they are great at interpolating – that is, predicting or filling in the gaps between known values.

Interpolation is not the same as creation. It does not generate knowledge, nor the insights necessary for decision-makers operating in complex environments.

However, these approaches require huge amounts of data. As a result, they encourage organisations to assemble enormous repositories of data – or trawl through existing datasets collected for other purposes. Dealing with “big data” brings considerable risks around security, privacy, legality and ethics.

In low-stakes situations, predictions based on “what the data suggest will happen” can be incredibly useful. But when the stakes are higher, there are two more questions we need to answer.

The first is about how the world works: “what is driving this outcome?” The second is about our knowledge of the world: “how confident are we about this?”…(More)”.