The Atlas of Surveillance

Electronic Frontier Foundation: “Law enforcement surveillance isn’t always secret. These technologies can be discovered in news articles and government meeting agendas, in company press releases and social media posts. It just hasn’t been aggregated before.

That’s the starting point for the Atlas of Surveillance, a collaborative effort between the Electronic Frontier Foundation and the University of Nevada, Reno Reynolds School of Journalism. Through a combination of crowdsourcing and data journalism, we are creating the largest-ever repository of information on which law enforcement agencies are using what surveillance technologies. The aim is to generate a resource for journalists, academics, and, most importantly, members of the public to check what’s been purchased locally and how technologies are spreading across the country.

We specifically focused on the most pervasive technologies, including drones, body-worn cameras, face recognition, cell-site simulators, automated license plate readers, predictive policing, camera registries, and gunshot detection. Although we have amassed more than 5,000 datapoints in 3,000 jurisdictions, our research only reveals the tip of the iceberg and underlines the need for journalists and members of the public to continue demanding transparency from criminal justice agencies….(More)”.

Four Principles for Integrating AI & Good Governance

Oxford Commission on AI and Good Governance: “Many governments, public agencies and institutions already employ AI in providing public services, the distribution of resources and the delivery of governance goods. In the public sector, AI-enabled governance may afford new efficiencies that have the potential to transform a wide array of public service tasks.
But short-sighted design and use of AI can create new problems, entrench existing inequalities, and calcify and ultimately undermine government organizations.

Frameworks for the procurement and implementation of AI in public service have widely remained undeveloped. Frequently, existing regulations and national laws are no longer fit for purpose to ensure
good behaviour (of either AI or private suppliers) and are ill-equipped to provide guidance on the democratic use of AI.
As technology evolves rapidly, we need rules to guide the use of AI in ways that safeguard democratic values. Under what conditions can AI be put into service for good governance?

We offer a framework for integrating AI with good governance. We believe that with dedicated attention and evidence-based policy research, it should be possible to overcome the combined technical and organizational challenges of successfully integrating AI with good governance. Doing so requires working towards:

Inclusive Design: issues around discrimination and bias of AI in relation to inadequate data sets, exclusion of minorities and under-represented
groups, and the lack of diversity in design.
Informed Procurement: issues around the acquisition and development in relation to due diligence, design and usability specifications and the assessment of risks and benefits.
Purposeful Implementation: issues around the use of AI in relation to interoperability, training needs for public servants, and integration with decision-making processes.
Persistent Accountability: issues around the accountability and transparency of AI in relation to ‘black box’ algorithms, the interpretability and explainability of systems, monitoring and auditing…(More)”

Race and America: why data matters

Federica Cocco and Alan Smith at the Financial Times: “… To understand the historical roots of black data activism, we have to return to October 1899. Back then, Thomas Calloway, a clerk in the War Department, wrote to the educator Booker T Washington about his pitch for an “American Negro Exhibit” at the 1900 Exposition Universelle in Paris. It was right in the middle of the scramble for Africa and Europeans had developed a morbid fascination with the people they were trying to subjugate.

To Calloway, the Paris exhibition offered a unique venue to sway the global elite to acknowledge “the possibilities of the Negro” and to influence cultural change in the US from an international platform.

It is hard to overstate the importance of international fairs at the time. They were a platform to bolster the prestige of nations. In Delivering Views: Distant Cultures in Early Postcards, Robert Rydell writes that fairs had become “a vehicle that, perhaps next to the church, had the greatest capacity to influence a mass audience”….

For the Paris World Fair, Du Bois and a team of Atlanta University students and alumni designed and drew by hand more than 60 bold data portraits. A first set used Georgia as a case study to illustrate the progress made by African Americans since the Civil War.

A second set showed how “the descendants of former African slaves now in residence in the United States of America” had become lawyers, doctors, inventors and musicians. For the first time, the growth of literacy and employment rates, the value of assets and land owned by African Americans and their growing consumer power were there for everyone to see. At the 1900 World Fair, the “Exhibit of American Negroes” took up a prominent spot in the Palace of Social Economy. “As soon as they entered the building, visitors were inundated by examples of black excellence,” says Whitney Battle-Baptiste, director of the WEB Du Bois Center at the University of Massachusetts Amherst and co-author of WEB Du Bois’s Data Portraits: Visualizing Black America….(More)”

Working with students and alumni from Atlanta University, Du Bois created 60 bold data portraits for the ‘Exhibit of American Negroes’

Working with students and alumni from Atlanta University, Du Bois created 60 bold data portraits for the ‘Exhibit of American Negroes’ © Library of Congress, Prints & Photographs Division

Why Hundreds of Mathematicians Are Boycotting Predictive Policing

Courtney Linder at Popular Mechanics: “Several prominent academic mathematicians want to sever ties with police departments across the U.S., according to a letter submitted to Notices of the American Mathematical Society on June 15. The letter arrived weeks after widespread protests against police brutality, and has inspired over 1,500 other researchers to join the boycott.

These mathematicians are urging fellow researchers to stop all work related to predictive policing software, which broadly includes any data analytics tools that use historical data to help forecast future crime, potential offenders, and victims. The technology is supposed to use probability to help police departments tailor their neighborhood coverage so it puts officers in the right place at the right time….

a flow chart showing how predictive policing works


According to a 2013 research briefing from the RAND Corporation, a nonprofit think tank in Santa Monica, California, predictive policing is made up of a four-part cycle (shown above). In the first two steps, researchers collect and analyze data on crimes, incidents, and offenders to come up with predictions. From there, police intervene based on the predictions, usually taking the form of an increase in resources at certain sites at certain times. The fourth step is, ideally, reducing crime.

“Law enforcement agencies should assess the immediate effects of the intervention to ensure that there are no immediately visible problems,” the authors note. “Agencies should also track longer-term changes by examining collected data, performing additional analysis, and modifying operations as needed.”

In many cases, predictive policing software was meant to be a tool to augment police departments that are facing budget crises with less officers to cover a region. If cops can target certain geographical areas at certain times, then they can get ahead of the 911 calls and maybe even reduce the rate of crime.

But in practice, the accuracy of the technology has been contested—and it’s even been called racist….(More)”.

Differential Privacy for Privacy-Preserving Data Analysis

Introduction to a Special Blog Series by NIST: “…How can we use data to learn about a population, without learning about specific individuals within the population? Consider these two questions:

  1.  “How many people live in Vermont?”
  2. “How many people named Joe Near live in Vermont?”

The first reveals a property of the whole population, while the second reveals information about one person. We need to be able to learn about trends in the population while preventing the ability to learn anything new about a particular individual. This is the goal of many statistical analyses of data, such as the statistics published by the U.S. Census Bureau, and machine learning more broadly. In each of these settings, models are intended to reveal trends in populations, not reflect information about any single individual.

But how can we answer the first question “How many people live in Vermont?” — which we’ll refer to as a query — while preventing the second question from being answered “How many people name Joe Near live in Vermont?” The most widely used solution is called de-identification (or anonymization), which removes identifying information from the dataset. (We’ll generally assume a dataset contains information collected from many individuals.) Another option is to allow only aggregate queries, such as an average over the data. Unfortunately, we now understand that neither approach actually provides strong privacy protection. De-identified datasets are subject to database-linkage attacks. Aggregation only protects privacy if the groups being aggregated are sufficiently large, and even then, privacy attacks are still possible [1, 2, 3, 4]. 

Differential Privacy

Differential privacy [5, 6] is a mathematical definition of what it means to have privacy. It is not a specific process like de-identification, but a property that a process can have. For example, it is possible to prove that a specific algorithm “satisfies” differential privacy.

Informally, differential privacy guarantees the following for each individual who contributes data for analysis: the output of a differentially private analysis will be roughly the same, whether or not you contribute your data. A differentially private analysis is often called a mechanism, and we denote it ℳ.

Figure 1: Informal Definition of Differential Privacy
Figure 1: Informal Definition of Differential Privacy

Figure 1 illustrates this principle. Answer “A” is computed without Joe’s data, while answer “B” is computed with Joe’s data. Differential privacy says that the two answers should be indistinguishable. This implies that whoever sees the output won’t be able to tell whether or not Joe’s data was used, or what Joe’s data contained.

We control the strength of the privacy guarantee by tuning the privacy parameter ε, also called a privacy loss or privacy budget. The lower the value of the ε parameter, the more indistinguishable the results, and therefore the more each individual’s data is protected.

Figure 2: Formal Definition of Differential Privacy
Figure 2: Formal Definition of Differential Privacy

We can often answer a query with differential privacy by adding some random noise to the query’s answer. The challenge lies in determining where to add the noise and how much to add. One of the most commonly used mechanisms for adding noise is the Laplace mechanism [5, 7]. 

Queries with higher sensitivity require adding more noise in order to satisfy a particular `epsilon` quantity of differential privacy, and this extra noise has the potential to make results less useful. We will describe sensitivity and this tradeoff between privacy and usefulness in more detail in future blog posts….(More)”.

What Ever Happened to Digital Contact Tracing?

Chas Kissick, Elliot Setzer, and Jacob Schulz at Lawfare: “In May of this year, Prime Minister Boris Johnson pledged the United Kingdom would develop a “world beating” track and trace system by June 1 to stop the spread of the novel coronavirus. But on June 18, the government quietly abandoned its coronavirus contact-tracing app, a key piece of the “world beating” strategy, and instead promised to switch to a model designed by Apple and Google. The delayed app will not be ready until winter, and the U.K.’s Junior Health Minister told reporters that “it isn’t a priority for us at the moment.” When Johnson came under fire in Parliament for the abrupt U-turn, he replied: “I wonder whether the right honorable and learned Gentleman can name a single country in the world that has a functional contact tracing app—there isn’t one.”

Johnson’s rebuttal is perhaps a bit reductive, but he’s not that far off.

You probably remember the idea of contact-tracing apps: the technological intervention that seemed to have the potential to save lives while enabling a hamstrung economy to safely inch back open; it was a fixation of many public health and privacy advocates; it was the thing that was going to help us get out of this mess if we could manage the risks.

Yet nearly three months after Google and Apple announced with great fanfare their partnership to build a contact-tracing API, contact-tracing apps have made an unceremonious exit from the front pages of American newspapers. Countries, states and localities continue to try to develop effective digital tracing strategies. But as Jonathan Zittrain puts it, the “bigger picture momentum appears to have waned.”

What’s behind contact-tracing apps’ departure from the spotlight? For one, there’s the onset of a larger pandemic apathy in the U.S; many politicians and Americans seem to have thrown up their hands or put all their hopes in the speedy development of a vaccine. Yet, the apps haven’t even made much of a splash in countries that havetaken the pandemic more seriously. Anxieties about privacy persist. But technical shortcomings in the apps deserve the lion’s share of the blame. Countries have struggled to get bespoke apps developed by government technicians to work on Apple phones. The functionality of some Bluetooth-enabled models vary widely depending on small changes in phone positioning. And most countries have only convinced a small fraction of their populace to use national tracing apps.

Maybe it’s still possible that contact-tracing apps will make a miraculous comeback and approach the level of efficacy observers once anticipated.

But even if technical issues implausibly subside, the apps are operating in a world of unknowns.

Most centrally, researchers still have no real idea what level of adoption is required for the apps to actually serve their function. Some estimates suggest that 80 percent of current smartphone owners in a given area would need to use an app and follow its recommendations for digital contact tracing to be effective. But other researchers have noted that the apps could slow the rate of infections even if little more than 10 percent of a population used a tracing app. It will be an uphill battle even to hit the 10 percent mark in America, though. Survey data show that fewer than three in 10 Americans intend to use contact-tracing apps if they become available…(More).

Why real-time economic data need to be treated with caution

The Economist: “The global downturn of 2020 is probably the most quantified on record. Economists, firms and statisticians seeking to gauge the depth of the collapse in economic activity and the pace of the recovery have seized upon a new dashboard of previously obscure indicators. Investors eagerly await the release of mobility statistics from tech companies such as Apple or Google, or restaurant-booking data from OpenTable, in a manner once reserved for official inflation and unemployment estimates. Central bankers pepper their speeches with novel barometers of consumer spending. Investment-bank analysts and journalists tout hot new measures of economic activity in the way that hipsters discuss the latest bands. Those who prefer to wait for official measures are regarded as being like fans of u2, a sanctimonious Irish rock group: stuck behind the curve as the rest of the world has moved on.

The main attraction of real-time data to policymakers and investors alike is timeliness. Whereas official, so-called hard data, such as inflation, employment or output measures, tend to be released with a lag of several weeks, or even months, real-time data, as the name suggests, can offer a window on today’s economic conditions. The depth of the downturns induced by covid-19 has put a premium on swift intelligence. The case for hard data has always been their quality, but this has suffered greatly during the pandemic. Compilers of official labour-market figures have struggled to account for furlough schemes and the like, and have plastered their releases with warnings about unusually high levels of uncertainty. Filling in statisticians’ forms has probably fallen to the bottom of firms’ to-do lists, reducing the accuracy of official output measures….

The value of real-time measures will be tested once the swings in economic activity approach a more normal magnitude. Mobility figures for March and April did predict the scale of the collapse in gdp, but that could have been estimated just as easily by stepping outside and looking around (at least in the places where that sort of thing was allowed during lockdown). Forecasters in rich countries are more used to quibbling over whether economies will grow at an annual rate of 2% or 3% than whether output will shrink by 20% or 30% in a quarter. Real-time measures have disappointed before. Immediately after Britain’s vote to leave the European Union in 2016, for instance, the indicators then watched by economists pointed to a sharp slowdown. It never came.

Real-time data, when used with care, have been a helpful supplement to official measures so far this year. With any luck the best of the new indicators will help official statisticians improve the quality and timeliness of their own figures. But, much like u2, the official measures have been around for a long time thanks to their tried and tested formula—and they are likely to stick around for a long time to come….(More)”.

Adolescent Mental Health: Using A Participatory Mapping Methodology to Identify Key Priorities for Data Collaboration

Blog by Alexandra Shaw, Andrew J. Zahuranec, Andrew Young, Stefaan G. Verhulst, Jennifer Requejo, Liliana Carvajal: “Adolescence is a unique stage of life. The brain undergoes rapid development; individuals face new experiences, relationships, and environments. These events can be exciting, but they can also be a source of instability and hardship. Half of all mental conditions manifest by early adolescence and between 10 and 20 percent of all children and adolescents report mental health conditions. Despite the increased risks and concerns for adolescents’ well-being, there remain significant gaps in availability of data at the country level for policymakers to address these issues.

In June, The GovLab partnered with colleagues at UNICEF’s Health and HIV team in the Division of Data, Analysis, Planning & Monitoring and the Data for Children Collaborative — a collaboration between UNICEF, the Scottish Government, and the University of Edinburgh — to design and apply a new methodology of participatory mapping and prioritization of key topics and issues associated with adolescent mental health that could be addressed through enhanced data collaboration….

The event led to three main takeaways. First, the topic mapping allows participants to deliberate and prioritize the various aspects of adolescent mental health in a more holistic manner. Unlike the “blind men and the elephant” parable, a topic map allows the participants to see and discuss  the interrelated parts of the topic, including those which they might be less familiar with.

Second, the workshops demonstrated the importance of tapping into distributed expertise via participatory processes. While the topic map provided a starting point, the inclusion of various experts allowed the findings of the document to be reviewed in a rapid, legitimate fashion. The diverse inputs helped ensure the individual aspects could be prioritized without a perspective being ignored.

Lastly, the approach showed the importance of data initiatives being driven and validated by those individuals representing the demand. By soliciting the input of those who would actually use the data, the methodology ensured data initiatives focused on the aspects thought to be most relevant and of greatest importance….(More)”

Addressing trust in public sector data use

Centre for Data Ethics and Innovation: “Data sharing is fundamental to effective government and the running of public services. But it is not an end in itself. Data needs to be shared to drive improvements in service delivery and benefit citizens. For this to happen sustainably and effectively, public trust in the way data is shared and used is vital. Without such trust, the government and wider public sector risks losing society’s consent, setting back innovation as well as the smooth running of public services. Maximising the benefits of data driven technology therefore requires a solid foundation of societal approval.

AI and data driven technology offer extraordinary potential to improve decision making and service delivery in the public sector – from improved diagnostics to more efficient infrastructure and personalised public services. This makes effective use of data more important than it has ever been, and requires a step-change in the way data is shared and used. Yet sharing more data also poses risks and challenges to current governance arrangements.

The only way to build trust sustainably is to operate in a trustworthy way. Without adequate safeguards the collection and use of personal data risks changing power relationships between the citizen and the state. Insights derived by big data and the matching of different data sets can also undermine individual privacy or personal autonomy. Trade-offs are required which reflect democratic values, wider public acceptability and a shared vision of a data driven society. CDEI has a key role to play in exploring this challenge and setting out how it can be addressed. This report identifies barriers to data sharing, but focuses on building and sustaining the public trust which is vital if society is to maximise the benefits of data driven technology.

There are many areas where the sharing of anonymised and identifiable personal data by the public sector already improves services, prevents harm, and benefits the public. Over the last 20 years, different governments have adopted various measures to increase data sharing, including creating new legal sharing gateways. However, despite efforts to increase the amount of data sharing across the government, and significant successes in areas like open data, data sharing continues to be challenging and resource-intensive. This report identifies a range of technical, legal and cultural barriers that can inhibit data sharing.

Barriers to data sharing in the public sector

Technical barriers include limited adoption of common data standards and inconsistent security requirements across the public sector. Such inconsistency can prevent data sharing, or increase the cost and time for organisations to finalise data sharing agreements.

While there are often pre-existing legal gateways for data sharing, underpinned by data protection legislation, there is still a large amount of legal confusion on the part of public sector bodies wishing to share data which can cause them to start from scratch when determining legality and commit significant resources to legal advice. It is not unusual for the development of data sharing agreements to delay the projects for which the data is intended. While the legal scrutiny of data sharing arrangements is an important part of governance, improving the efficiency of these processes – without sacrificing their rigour – would allow data to be shared more quickly and at less expense.

Even when legal, the permissive nature of many legal gateways means significant cultural and organisational barriers to data sharing remain. Individual departments and agencies decide whether or not to share the data they hold and may be overly risk averse. Data sharing may not be prioritised by a department if it would require them to bear costs to deliver benefits that accrue elsewhere (i.e. to those gaining access to the data). Departments sharing data may need to invest significant resources to do so, as well as considering potential reputational or legal risks. This may hold up progress towards finding common agreement on data sharing. When there is an absence of incentives, even relatively small obstacles may mean data sharing is not deemed worthwhile by those who hold the data – despite the fact that other parts of the public sector might benefit significantly….(More)”.

Privacy‐Preserving Data Visualization: Reflections on the State of the Art and Research Opportunities

Paper by Kaustav Bhattacharjee, Min Chen. and Aritra Dasgupta: “Preservation of data privacy and protection of sensitive information from potential adversaries constitute a key socio‐technical challenge in the modern era of ubiquitous digital transformation. Addressing this challenge needs analysis of multiple factors: algorithmic choices for balancing privacy and loss of utility, potential attack scenarios that can be undertaken by adversaries, implications for data owners, data subjects, and data sharing policies, and access control mechanisms that need to be built into interactive data interfaces.

Visualization has a key role to play as part of the solution space, both as a medium of privacy‐aware information communication and also as a tool for understanding the link between privacy parameters and data sharing policies. The field of privacy‐preserving data visualization has witnessed progress along many of these dimensions. In this state‐of‐the‐art report, our goal is to provide a systematic analysis of the approaches, methods, and techniques used for handling data privacy in visualization. We also reflect on the road‐map ahead by analyzing the gaps and research opportunities for solving some of the pressing socio‐technical challenges involving data privacy with the help of visualization….(More)”.