Interventions to mitigate the racially discriminatory impacts of emerging tech including AI


Joint Civil Society Statement: “As widespread recent protests have highlighted, racial inequality remains an urgent and devastating issue around the world, and this is as true in the context of technology as it is everywhere else. In fact, it may be more so, as algorithmic technologies based on big data are deployed at previously unimaginable scale, reproducing the discriminatory systems that build and govern them.

The undersigned organizations welcome the publication of the report “Racial discrimination and emerging digital technologies: a human rights analysis,” by Special Rapporteur on contemporary forms of racism, racial discrimination, xenophobia and related intolerance, E. Tendayi Achiume, and wish to underscore the importance and timeliness of a number of the recommendations made therein:

  1. Technologies that have had or will have significant racially discriminatory impacts should be banned outright.
    While incremental regulatory approaches may be appropriate in some contexts, where a technology is demonstrably likely to cause racially discriminatory harm, it should not be deployed until that harm can be prevented. Moreover, certain technologies may always have disparate racial impacts, no matter how much their accuracy can be improved. In the present moment, racially discriminatory technologies include facial and affect recognition technology and so-called predictive analytics. We support Special Rapporteur Achiume’s call for mandatory human rights impact assessments as a prerequisite for the adoption of new technologies. We also believe that where such assessments reveal that a technology has a high likelihood of deleterious racially disparate impacts, states should prevent its use through a ban or moratorium. We join the Special Rapporteur in welcoming recent municipal bans, for example, on the use of facial recognition technology, and encourage national governments to adopt similar policies.  Correspondingly, we reiterate our support for states’ imposition of an immediate moratorium on the trade and use of privately developed surveillance tools until such time as states enact appropriate safeguards, and congratulate Special Rapporteur Achiume on joining that call.
  2. Gender mainstreaming and representation along racial, national and other intersecting identities requires radical improvement at all levels of the tech sector.
  3. Technologists cannot solve political, social, and economic problems without the input of domain experts and those personally impacted.
  4. Access to technology is as urgent an issue of racial discrimination as inequity in the design of technologies themselves.
  5. Representative and disaggregated data is a necessary, if not sufficient, condition for racial equity in emerging digital technologies, but it must be collected and managed equitably as well.
  6. States as well as corporations must provide remedies for racial discrimination, including reparations.… (More)”.

The Atlas of Surveillance


Electronic Frontier Foundation: “Law enforcement surveillance isn’t always secret. These technologies can be discovered in news articles and government meeting agendas, in company press releases and social media posts. It just hasn’t been aggregated before.

That’s the starting point for the Atlas of Surveillance, a collaborative effort between the Electronic Frontier Foundation and the University of Nevada, Reno Reynolds School of Journalism. Through a combination of crowdsourcing and data journalism, we are creating the largest-ever repository of information on which law enforcement agencies are using what surveillance technologies. The aim is to generate a resource for journalists, academics, and, most importantly, members of the public to check what’s been purchased locally and how technologies are spreading across the country.

We specifically focused on the most pervasive technologies, including drones, body-worn cameras, face recognition, cell-site simulators, automated license plate readers, predictive policing, camera registries, and gunshot detection. Although we have amassed more than 5,000 datapoints in 3,000 jurisdictions, our research only reveals the tip of the iceberg and underlines the need for journalists and members of the public to continue demanding transparency from criminal justice agencies….(More)”.

Differential Privacy for Privacy-Preserving Data Analysis


Introduction to a Special Blog Series by NIST: “…How can we use data to learn about a population, without learning about specific individuals within the population? Consider these two questions:

  1.  “How many people live in Vermont?”
  2. “How many people named Joe Near live in Vermont?”

The first reveals a property of the whole population, while the second reveals information about one person. We need to be able to learn about trends in the population while preventing the ability to learn anything new about a particular individual. This is the goal of many statistical analyses of data, such as the statistics published by the U.S. Census Bureau, and machine learning more broadly. In each of these settings, models are intended to reveal trends in populations, not reflect information about any single individual.

But how can we answer the first question “How many people live in Vermont?” — which we’ll refer to as a query — while preventing the second question from being answered “How many people name Joe Near live in Vermont?” The most widely used solution is called de-identification (or anonymization), which removes identifying information from the dataset. (We’ll generally assume a dataset contains information collected from many individuals.) Another option is to allow only aggregate queries, such as an average over the data. Unfortunately, we now understand that neither approach actually provides strong privacy protection. De-identified datasets are subject to database-linkage attacks. Aggregation only protects privacy if the groups being aggregated are sufficiently large, and even then, privacy attacks are still possible [1, 2, 3, 4]. 

Differential Privacy

Differential privacy [5, 6] is a mathematical definition of what it means to have privacy. It is not a specific process like de-identification, but a property that a process can have. For example, it is possible to prove that a specific algorithm “satisfies” differential privacy.

Informally, differential privacy guarantees the following for each individual who contributes data for analysis: the output of a differentially private analysis will be roughly the same, whether or not you contribute your data. A differentially private analysis is often called a mechanism, and we denote it ℳ.

Figure 1: Informal Definition of Differential Privacy
Figure 1: Informal Definition of Differential Privacy

Figure 1 illustrates this principle. Answer “A” is computed without Joe’s data, while answer “B” is computed with Joe’s data. Differential privacy says that the two answers should be indistinguishable. This implies that whoever sees the output won’t be able to tell whether or not Joe’s data was used, or what Joe’s data contained.

We control the strength of the privacy guarantee by tuning the privacy parameter ε, also called a privacy loss or privacy budget. The lower the value of the ε parameter, the more indistinguishable the results, and therefore the more each individual’s data is protected.

Figure 2: Formal Definition of Differential Privacy
Figure 2: Formal Definition of Differential Privacy

We can often answer a query with differential privacy by adding some random noise to the query’s answer. The challenge lies in determining where to add the noise and how much to add. One of the most commonly used mechanisms for adding noise is the Laplace mechanism [5, 7]. 

Queries with higher sensitivity require adding more noise in order to satisfy a particular `epsilon` quantity of differential privacy, and this extra noise has the potential to make results less useful. We will describe sensitivity and this tradeoff between privacy and usefulness in more detail in future blog posts….(More)”.

Addressing trust in public sector data use


Centre for Data Ethics and Innovation: “Data sharing is fundamental to effective government and the running of public services. But it is not an end in itself. Data needs to be shared to drive improvements in service delivery and benefit citizens. For this to happen sustainably and effectively, public trust in the way data is shared and used is vital. Without such trust, the government and wider public sector risks losing society’s consent, setting back innovation as well as the smooth running of public services. Maximising the benefits of data driven technology therefore requires a solid foundation of societal approval.

AI and data driven technology offer extraordinary potential to improve decision making and service delivery in the public sector – from improved diagnostics to more efficient infrastructure and personalised public services. This makes effective use of data more important than it has ever been, and requires a step-change in the way data is shared and used. Yet sharing more data also poses risks and challenges to current governance arrangements.

The only way to build trust sustainably is to operate in a trustworthy way. Without adequate safeguards the collection and use of personal data risks changing power relationships between the citizen and the state. Insights derived by big data and the matching of different data sets can also undermine individual privacy or personal autonomy. Trade-offs are required which reflect democratic values, wider public acceptability and a shared vision of a data driven society. CDEI has a key role to play in exploring this challenge and setting out how it can be addressed. This report identifies barriers to data sharing, but focuses on building and sustaining the public trust which is vital if society is to maximise the benefits of data driven technology.

There are many areas where the sharing of anonymised and identifiable personal data by the public sector already improves services, prevents harm, and benefits the public. Over the last 20 years, different governments have adopted various measures to increase data sharing, including creating new legal sharing gateways. However, despite efforts to increase the amount of data sharing across the government, and significant successes in areas like open data, data sharing continues to be challenging and resource-intensive. This report identifies a range of technical, legal and cultural barriers that can inhibit data sharing.

Barriers to data sharing in the public sector

Technical barriers include limited adoption of common data standards and inconsistent security requirements across the public sector. Such inconsistency can prevent data sharing, or increase the cost and time for organisations to finalise data sharing agreements.

While there are often pre-existing legal gateways for data sharing, underpinned by data protection legislation, there is still a large amount of legal confusion on the part of public sector bodies wishing to share data which can cause them to start from scratch when determining legality and commit significant resources to legal advice. It is not unusual for the development of data sharing agreements to delay the projects for which the data is intended. While the legal scrutiny of data sharing arrangements is an important part of governance, improving the efficiency of these processes – without sacrificing their rigour – would allow data to be shared more quickly and at less expense.

Even when legal, the permissive nature of many legal gateways means significant cultural and organisational barriers to data sharing remain. Individual departments and agencies decide whether or not to share the data they hold and may be overly risk averse. Data sharing may not be prioritised by a department if it would require them to bear costs to deliver benefits that accrue elsewhere (i.e. to those gaining access to the data). Departments sharing data may need to invest significant resources to do so, as well as considering potential reputational or legal risks. This may hold up progress towards finding common agreement on data sharing. When there is an absence of incentives, even relatively small obstacles may mean data sharing is not deemed worthwhile by those who hold the data – despite the fact that other parts of the public sector might benefit significantly….(More)”.

Privacy‐Preserving Data Visualization: Reflections on the State of the Art and Research Opportunities


Paper by Kaustav Bhattacharjee, Min Chen. and Aritra Dasgupta: “Preservation of data privacy and protection of sensitive information from potential adversaries constitute a key socio‐technical challenge in the modern era of ubiquitous digital transformation. Addressing this challenge needs analysis of multiple factors: algorithmic choices for balancing privacy and loss of utility, potential attack scenarios that can be undertaken by adversaries, implications for data owners, data subjects, and data sharing policies, and access control mechanisms that need to be built into interactive data interfaces.

Visualization has a key role to play as part of the solution space, both as a medium of privacy‐aware information communication and also as a tool for understanding the link between privacy parameters and data sharing policies. The field of privacy‐preserving data visualization has witnessed progress along many of these dimensions. In this state‐of‐the‐art report, our goal is to provide a systematic analysis of the approaches, methods, and techniques used for handling data privacy in visualization. We also reflect on the road‐map ahead by analyzing the gaps and research opportunities for solving some of the pressing socio‐technical challenges involving data privacy with the help of visualization….(More)”.

Ethical and Legal Aspects of Open Data Affecting Farmers


Report by Foteini Zampati et al: “Open Data offers a great potential for innovations from which the agricultural sector can benefit decisively due to a wide range of possibilities for further use. However, there are many inter-linked issues in the whole data value chain that affect the ability of farmers, especially the poorest and most vulnerable, to access, use and harness the benefits of data and data-driven technologies.

There are technical challenges and ethical and legal challenges as well. Of all these challenges, the ethical and legal aspects related to accessing and using data by the farmers and sharing farmers’ data have been less explored.

We aimed to identify gaps and highlight the often-complex legal issues related to open data in the areas of law (e.g. data ownership, data rights) policies, codes of conduct, data protection, intellectual property rights, licensing contracts and personal privacy.

This report is an output of the Kampala INSPIRE Hackathon 2020. The Hackathon addressed key topics identified by the IST-Africa 2020 conference, such as: Agriculture, environmental sustainability, collaborative open innovation, and ICT-enabled entrepreneurship.

The goal of the event was to continue to build on the efforts of the 2019 Nairobi INSPIRE Hackathon, further strengthening relationships between various EU projects and African communities. It was a successful event, with more than 200 participants representing 26 African countries. The INSPIRE Hackathons are not a competition, rather the main focus is building relationships, making rapid developments, and collecting ideas for future research and innovation….(More)”.

Preservation of Individuals’ Privacy in Shared COVID-19 Related Data


Paper by Stefan Sauermann et al: “This paper provides insight into how restricted data can be incorporated in an open-be-default-by-design digital infrastructure for scientific data. We focus, in particular, on the ethical component of FAIRER (Findable, Accessible, Interoperable, Ethical, and Reproducible) data, and the pseudo-anonymization and anonymization of COVID-19 datasets to protect personally identifiable information (PII). First we consider the need for the customisation of the existing privacy preservation techniques in the context of rapid production, integration, sharing and analysis of COVID-19 data. Second, the methods for the pseudo-anonymization of direct identification variables are discussed. We also discuss different pseudo-IDs of the same person for multi-domain and multi-organization. Essentially, pseudo-anonymization and its encrypted domain specific IDs are used to successfully match data later, if required and permitted, as well as to restore the true ID (and authenticity) in individual cases of a patient’s clarification.Third, we discuss application of statistical disclosure control (SDC) techniques to COVID-19 disease data. To assess and limit the risk of re-identification of individual persons in COVID-19 datasets (that are often enriched with other covariates like age, gender, nationality, etc.) to acceptable levels, the risk of successful re-identification by a combination of attribute values must be assessed and controlled. This is done using statistical disclosure control for anonymization of data. Lastly, we discuss the limitations of the proposed techniques and provide general guidelines on using disclosure risks to decide on appropriate modes for data sharing to preserve the privacy of the individuals in the datasets….(More)”.

Blind-sided by privacy? Digital contact tracing, the Apple/Google API and big tech’s newfound role as global health policy makers


Paper by Tamar Sharon: “Since the outbreak of COVID-19, governments have turned their attention to digital contact tracing. In many countries, public debate has focused on the risks this technology poses to privacy, with advocates and experts sounding alarm bells about surveillance and mission creep reminiscent of the post 9/11 era. Yet, when Apple and Google launched their contact tracing API in April 2020, some of the world’s leading privacy experts applauded this initiative for its privacy-preserving technical specifications. In an interesting twist, the tech giants came to be portrayed as greater champions of privacy than some democratic governments.

This article proposes to view the Apple/Google API in terms of a broader phenomenon whereby tech corporations are encroaching into ever new spheres of social life. From this perspective, the (legitimate) advantage these actors have accrued in the sphere of the production of digital goods provides them with (illegitimate) access to the spheres of health and medicine, and more worrisome, to the sphere of politics. These sphere transgressions raise numerous risks that are not captured by the focus on privacy harms. Namely, a crowding out of essential spherical expertise, new dependencies on corporate actors for the delivery of essential, public goods, the shaping of (global) public policy by non-representative, private actors and ultimately, the accumulation of decision-making power across multiple spheres. While privacy is certainly an important value, its centrality in the debate on digital contact tracing may blind us to these broader societal harms and unwittingly pave the way for ever more sphere transgressions….(More)”.

The Data Delusion: Protecting Individual Data is Not Enough When the Harm is Collective


Essay by Martin Tisné: “On March 17, 2018, questions about data privacy exploded with the scandal of the previously unknown consulting company Cambridge Analytica. Lawmakers are still grappling with updating laws to counter the harms of big data and AI. In the Spring of 2020, the Covid-19 pandemic brought questions about sufficient legal protections back to the public debate, with urgent warnings about the privacy implications of contact tracing apps. But the surveillance consequences of the pandemic’s aftermath are much bigger than any app: transport, education, health
systems and offices are being turned into vast surveillance networks. If we only consider individual trade-offs between privacy sacrifices and alleged health benefits, we will miss the point. The collective nature of big data means people are more impacted by other people’s data than by data about them. Like climate change, the threat is societal and personal.

In the era of big data and AI, people can suffer because of how the sum of individual data is analysed and sorted into groups by algorithms. Novel forms of collective data-driven harms are appearing as a result: online housing, job and credit ads discriminating on the basis of race and gender, women disqualified from jobs on the basis of gender and foreign actors targeting light-right groups, pulling them to the far-right.2 Our public debate, governments, and laws are ill-equipped to deal with these collective, as opposed to individual, harms….(More)”.

Ethical and societal implications of algorithms, data, and artificial intelligence: a roadmap for research


Report by the Nuffield Foundation and the Leverhulme Centre for the Future of Intelligence:” The aim of this report is to offer a broad roadmap for work on the ethical and societal implications of algorithms, data, and AI (ADA) in the coming years. It is aimed at those involved in planning, funding, and pursuing research and policy work related to these technologies. We use the term ‘ADA-based technologies’ to capture a broad range of ethically and societally relevant technologies based on algorithms, data, and AI, recognising that these three concepts are not totally separable from one another and will often overlap. A shared set of key concepts and concerns is emerging, with widespread agreement on some of the core issues (such as bias) and values (such as fairness) that an ethics of algorithms, data, and AI should focus on. Over the last two years, these have begun to be codified in various codes and sets of ‘principles’. Agreeing on these issues, values and high-level principles is an important step for ensuring that ADA-based technologies are developed and used for the benefit of society. However, we see three main gaps in this existing work: (i) a lack of clarity or consensus around the meaning of central ethical concepts and how they apply in specific situations; (ii) insufficient attention given to tensions between ideals and values; (iii) insufficient evidence on both (a) key technological capabilities and impacts, and (b) the perspectives of different publics.”….(More)”.