It’s Official: Cars Are the Worst Product Category We Have Ever Reviewed for Privacy


Article by the Mozilla Foundation: “Car makers have been bragging about their cars being “computers on wheels” for years to promote their advanced features. However, the conversation about what driving a computer means for its occupants’ privacy hasn’t really caught up. While we worried that our doorbells and watches that connect to the internet might be spying on us, car brands quietly entered the data business by turning their vehicles into powerful data-gobbling machines. Machines that, because of their all those brag-worthy bells and whistles, have an unmatched power to watch, listen, and collect information about what you do and where you go in your car.

All 25 car brands we researched earned our *Privacy Not Included warning label — making cars the official worst category of products for privacy that we have ever reviewed…(More)”.

A new way to look at data privacy


Article by Adam Zewe: “Imagine that a team of scientists has developed a machine-learning model that can predict whether a patient has cancer from lung scan images. They want to share this model with hospitals around the world so clinicians can start using it in diagnosis.

But there’s a problem. To teach their model how to predict cancer, they showed it millions of real lung scan images, a process called training. Those sensitive data, which are now encoded into the inner workings of the model, could potentially be extracted by a malicious agent. The scientists can prevent this by adding noise, or more generic randomness, to the model that makes it harder for an adversary to guess the original data. However, perturbation reduces a model’s accuracy, so the less noise one can add, the better.

MIT researchers have developed a technique that enables the user to potentially add the smallest amount of noise possible, while still ensuring the sensitive data are protected.

The researchers created a new privacy metric, which they call Probably Approximately Correct (PAC) Privacy, and built a framework based on this metric that can automatically determine the minimal amount of noise that needs to be added. Moreover, this framework does not need knowledge of the inner workings of a model or its training process, which makes it easier to use for different types of models and applications.

In several cases, the researchers show that the amount of noise required to protect sensitive data from adversaries is far less with PAC Privacy than with other approaches. This could help engineers create machine-learning models that provably hide training data, while maintaining accuracy in real-world settings…

A fundamental question in data privacy is: How much sensitive data could an adversary recover from a machine-learning model with noise added to it?

Differential Privacy, one popular privacy definition, says privacy is achieved if an adversary who observes the released model cannot infer whether an arbitrary individual’s data is used for the training processing. But provably preventing an adversary from distinguishing data usage often requires large amounts of noise to obscure it. This noise reduces the model’s accuracy.

PAC Privacy looks at the problem a bit differently. It characterizes how hard it would be for an adversary to reconstruct any part of randomly sampled or generated sensitive data after noise has been added, rather than only focusing on the distinguishability problem…(More)”

How Statisticians Should Grapple with Privacy in a Changing Data Landscape


Article by Joshua Snoke, and Claire McKay Bowen: “Suppose you had a data set that contained records of individuals, including demographics such as their age, sex, and race. Suppose also that these data contained additional in-depth personal information, such as financial records, health status, or political opinions. Finally, suppose that you wanted to glean relevant insights from these data using machine learning, causal inference, or survey sampling adjustments. What methods would you use? What best practices would you ensure you followed? Where would you seek information to help guide you in this process?…(More)”

Attacks on Tax Privacy: How the Tax Prep Industry Enabled Meta to Harvest Millions of Taxpayers’ Sensitive Data


Congressional Report: “The investigation revealed that:

  • Tax preparation companies shared millions of taxpayers’ data with Meta, Google, and other Big Tech firms: The tax prep companies used computer code – known as pixels – to send data to Meta and Google. While most websites use pixels, it is particularly reckless for online tax preparation websites to use them on webpages where tax return information is entered unless further steps are taken to ensure that the pixels do not access sensitive information. TaxAct, TaxSlayer, and H&R Block confirmed that they had used the Meta Pixel, and had been using it “for at least a couple of years” and all three companies had been using Google Analytics (GA) for even longer.
  • Tax prep companies shared extraordinarily sensitive personal and financial information with Meta, which used the data for diverse advertising purposes: TaxAct, H&R Block, and TaxSlayer each revealed, in response to this Congressional inquiry, that they shared taxpayer data via their use of the Meta Pixel and Google’s tools. Although the tax prep companies and Big Tech firms claimed that all shared data was anonymous, the FTC and experts have indicated that the data could easily be used to identify individuals, or to create a dossier on them that could be used for targeted advertising or other purposes. 
  • Tax prep companies and Big Tech firms were reckless about their data sharing practices and their treatment of sensitive taxpayer data: The tax prep companies indicated that they installed the Meta and Google tools on their websites without fully understanding the extent to which they would send taxpayer data to these tech firms, without consulting with independent compliance or privacy experts, and without full knowledge of Meta’s use of and disposition of the data. 
  • Tax prep companies may have violated taxpayer privacy laws by sharing taxpayer data with Big Tech firms: Under the law, “a tax return preparer may not disclose or use a taxpayer’s tax return information prior to obtaining a written consent from the taxpayer,” – and they failed to do so when it came to the information that was turned over to Meta and Google. Tax prep companies can also turn over data to “auxiliary service providers in connection with the preparation of a tax return.” But Meta and Google likely do not meet the definition of “auxiliary service providers” and the data sharing with Meta was for advertising purposes – not “in connection with the preparation of a tax return.”…(More)”.

How Good Are Privacy Guarantees? Platform Architecture and Violation of User Privacy


Paper by Daron Acemoglu, Alireza Fallah, Ali Makhdoumi, Azarakhsh Malekian & Asuman Ozdaglar: “Many platforms deploy data collected from users for a multitude of purposes. While some are beneficial to users, others are costly to their privacy. The presence of these privacy costs means that platforms may need to provide guarantees about how and to what extent user data will be harvested for activities such as targeted ads, individualized pricing, and sales to third parties. In this paper, we build a multi-stage model in which users decide whether to share their data based on privacy guarantees. We first introduce a novel mask-shuffle mechanism and prove it is Pareto optimal—meaning that it leaks the least about the users’ data for any given leakage about the underlying common parameter. We then show that under any mask-shuffle mechanism, there exists a unique equilibrium in which privacy guarantees balance privacy costs and utility gains from the pooling of user data for purposes such as assessment of health risks or product development. Paradoxically, we show that as users’ value of pooled data increases, the equilibrium of the game leads to lower user welfare. This is because platforms take advantage of this change to reduce privacy guarantees so much that user utility declines (whereas it would have increased with a given mechanism). Even more strikingly, we show that platforms have incentives to choose data architectures that systematically differ from those that are optimal from the user’s point of view. In particular, we identify a class of pivot mechanisms, linking individual privacy to choices by others, which platforms prefer to implement and which make users significantly worse off…(More)”.

Vulnerability and Data Protection Law


Book by Gianclaudio Malgieri: “Vulnerability has traditionally been viewed through the lens of specific groups of people, such as ethnic minorities, children, the elderly, or people with disabilities. With the rise of digital media, our perceptions of vulnerable groups and individuals have been reshaped as new vulnerabilities and different vulnerable sub-groups of users, consumers, citizens, and data subjects emerge.

Vulnerability and Data Protection Law not only depicts these problems but offers the reader a detailed investigation of the concept of data subjects and a reconceptualization of the notion of vulnerability within the General Data Protection Regulation. The regulation offers a forward-facing set of tools that-though largely underexplored-are essential in rebalancing power asymmetries and mitigating induced vulnerabilities in the age of artificial intelligence.

Considering the new risks and potentialities of the digital market, the new awareness about cognitive weaknesses, and the new philosophical sensitivity about the condition of human vulnerability, the author looks for a more general and layered definition of the data subject’s vulnerability that goes beyond traditional labels. In doing so, he seeks to promote a ‘vulnerability-aware’ interpretation of the GDPR.

A heuristic analysis that re-interprets the whole GDPR, this work is essential for both scholars of data protection law and for policymakers looking to strengthen regulations and protect the data of vulnerable individuals…(More)”.

Privacy-enhancing technologies (PETs)


Report by the Information Commissioner’s Office (UK): “This guidance discusses privacy-enhancing technologies (PETs) in detail. Read it if you have questions not answered in the Guide, or if you need a deeper understanding to help you apply PETs in practice.

The first part of the guidance is aimed at DPOs (data protection officers) and those with specific data protection responsibilities in larger organisations. It focuses on how PETs can help you achieve compliance with data protection law.

The second part is intended for a more technical audience, and for DPOs who want to understand more detail about the types of PETs that are currently available. It gives a brief introduction to eight types of PETs and explains their risks and benefits…(More)”.

“My sex-related data is more sensitive than my financial data and I want the same level of security and privacy”: User Risk Perceptions and Protective Actions in Female-oriented Technologies


Paper by Maryam Mehrnezhad, and Teresa Almeida: “The digitalization of the reproductive body has engaged myriads of cutting-edge technologies in supporting people to know and tackle their intimate health. Generally understood as female technologies (aka female-oriented technologies or ‘FemTech’), these products and systems collect a wide range of intimate data which are processed, transferred, saved and shared with other parties. In this paper, we explore how the “data-hungry” nature of this industry and the lack of proper safeguarding mechanisms, standards, and regulations for vulnerable data can lead to complex harms or faint agentic potential. We adopted mixed methods in exploring users’ understanding of the security and privacy (SP) of these technologies. Our findings show that while users can speculate the range of harms and risks associated with these technologies, they are not equipped and provided with the technological skills to protect themselves against such risks. We discuss a number of approaches, including participatory threat modelling and SP by design, in the context of this work and conclude that such approaches are critical to protect users in these sensitive systems…(More)”.

The Prediction Society: Algorithms and the Problems of Forecasting the Future


Paper by Hideyuki Matsumi and Daniel J. Solove: “Predictions about the future have been made since the earliest days of humankind, but today, we are living in a brave new world of prediction. Today’s predictions are produced by machine learning algorithms that analyze massive quantities of personal data. Increasingly, important decisions about people are being made based on these predictions.

Algorithmic predictions are a type of inference. Many laws struggle to account for inferences, and even when they do, the laws lump all inferences together. But as we argue in this Article, predictions are different from other inferences. Predictions raise several unique problems that current law is ill-suited to address. First, algorithmic predictions create a fossilization problem because they reinforce patterns in past data and can further solidify bias and inequality from the past. Second, algorithmic predictions often raise an unfalsiability problem. Predictions involve an assertion about future events. Until these events happen, predictions remain unverifiable, resulting in an inability for individuals to challenge them as false. Third, algorithmic predictions can involve a preemptive intervention problem, where decisions or interventions render it impossible to determine whether the predictions would have come true. Fourth, algorithmic predictions can lead to a self-fulfilling prophecy problem where they actively shape the future they aim to forecast.

More broadly, the rise of algorithmic predictions raises an overarching concern: Algorithmic predictions not only forecast the future but also have the power to create and control it. The increasing pervasiveness of decisions based on algorithmic predictions is leading to a prediction society where individuals’ ability to author their own future is diminished while the organizations developing and using predictive systems are gaining greater power to shape the future…(More)”

From LogFrames to Logarithms – A Travel Log


Article by Karl Steinacker and Michael Kubach: “..Today, authorities all over the world are experimenting with predictive algorithms. That sounds technical and innocent but as we dive deeper into the issue, we realise that the real meaning is rather specific: fraud detection systems in social welfare payment systems. In the meantime, the hitherto banned terminology had it’s come back: welfare or social safety nets are, since a couple of years, en vogue again. But in the centuries-old Western tradition, welfare recipients must be monitored and, if necessary, sanctioned, while those who work and contribute must be assured that there is no waste. So it comes at no surprise that even today’s algorithms focus on the prime suspect, the individual fraudster, the undeserving poor.

Fraud detection systems promise that the taxpayer will no longer fall victim to fraud and efficiency gains can be re-directed to serve more people. The true extent of welfare fraud is regularly exaggerated  while the costs of such systems is routinely underestimated. A comparison of the estimated losses and investments doesn’t take place. It is the principle to detect and punish the fraudsters that prevail. Other issues don’t rank high either, for example on how to distinguish between honest mistakes and deliberate fraud. And as case workers spent more time entering and analysing data and in front of a computer screen, the less they have time and inclination to talk to real people and to understand the context of their life at the margins of society.

Thus, it can be said that routinely hundreds of thousands of people are being scored. Example Denmark: Here, a system called Udbetaling Danmark was created in 2012 to streamline the payment of welfare benefits. Its fraud control algorithms can access the personal data of millions of citizens, not all of whom receive welfare payments. In contrast to the hundreds of thousands affected by this data mining, the number of cases referred to the Police for further investigation are minute. 

In the city of Rotterdam in the Netherlands every year, data of 30,000 welfare recipients is investigated in order to flag suspected welfare cheats. However, an analysis of its scoring system based on machine learning and algorithms showed systemic discrimination with regard to ethnicity, age, gender, and parenthood. It revealed evidence of other fundamental flaws making the system both inaccurate and unfair. What might appear to a caseworker as a vulnerability is treated by the machine as grounds for suspicion. Despite the scale of data used to calculate risk scores, the output of the system is not better than random guesses. However, the consequences of being flagged by the “suspicion machine” can be drastic, with fraud controllers empowered to turn the lives of suspects inside out.

As reported by the World Bank, the recent Covid-19 pandemic provided a great push to implement digital social welfare systems in the global South. In fact, for the World Bank the so-called Digital Public Infrastructure (DPI), enabling “Digitizing Government to Person Payments (G2Px)”, are as fundamental for social and economic development today as physical infrastructure was for previous generations. Hence, the World Bank is finances globally systems modelled after the Indian Aadhaar system, where more than a billion persons have been registered biometrically. Aadhaar has become, for all intents and purposes, a pre-condition to receive subsidised food and other assistance for 800 million Indian citizens.

Important international aid organisations are not behaving differently from states. The World Food Programme alone holds data of more than 40 million people on its Scope data base. Unfortunately, WFP like other UN organisations, is not subject to data protection laws and the jurisdiction of courts. This makes the communities they have worked with particularly vulnerable.

In most places, the social will become the metric, where logarithms determine the operational conduit for delivering, controlling and withholding assistance, especially welfare payments. In other places, the power of logarithms may go even further, as part of trust systems, creditworthiness, and social credit. These social credit systems for individuals are highly controversial as they require mass surveillance since they aim to track behaviour beyond financial solvency. The social credit score of a citizen might not only suffer from incomplete, or inaccurate data, but also from assessing political loyalties and conformist social behaviour…(More)”.