Explore our articles
View All Results

Stefaan Verhulst

Courtney Linder at Popular Mechanics: “Several prominent academic mathematicians want to sever ties with police departments across the U.S., according to a letter submitted to Notices of the American Mathematical Society on June 15. The letter arrived weeks after widespread protests against police brutality, and has inspired over 1,500 other researchers to join the boycott.

These mathematicians are urging fellow researchers to stop all work related to predictive policing software, which broadly includes any data analytics tools that use historical data to help forecast future crime, potential offenders, and victims. The technology is supposed to use probability to help police departments tailor their neighborhood coverage so it puts officers in the right place at the right time….

a flow chart showing how predictive policing works

RAND

According to a 2013 research briefing from the RAND Corporation, a nonprofit think tank in Santa Monica, California, predictive policing is made up of a four-part cycle (shown above). In the first two steps, researchers collect and analyze data on crimes, incidents, and offenders to come up with predictions. From there, police intervene based on the predictions, usually taking the form of an increase in resources at certain sites at certain times. The fourth step is, ideally, reducing crime.

“Law enforcement agencies should assess the immediate effects of the intervention to ensure that there are no immediately visible problems,” the authors note. “Agencies should also track longer-term changes by examining collected data, performing additional analysis, and modifying operations as needed.”

In many cases, predictive policing software was meant to be a tool to augment police departments that are facing budget crises with less officers to cover a region. If cops can target certain geographical areas at certain times, then they can get ahead of the 911 calls and maybe even reduce the rate of crime.

But in practice, the accuracy of the technology has been contested—and it’s even been called racist….(More)”.

Why Hundreds of Mathematicians Are Boycotting Predictive Policing

Introduction to a Special Blog Series by NIST: “…How can we use data to learn about a population, without learning about specific individuals within the population? Consider these two questions:

  1.  “How many people live in Vermont?”
  2. “How many people named Joe Near live in Vermont?”

The first reveals a property of the whole population, while the second reveals information about one person. We need to be able to learn about trends in the population while preventing the ability to learn anything new about a particular individual. This is the goal of many statistical analyses of data, such as the statistics published by the U.S. Census Bureau, and machine learning more broadly. In each of these settings, models are intended to reveal trends in populations, not reflect information about any single individual.

But how can we answer the first question “How many people live in Vermont?” — which we’ll refer to as a query — while preventing the second question from being answered “How many people name Joe Near live in Vermont?” The most widely used solution is called de-identification (or anonymization), which removes identifying information from the dataset. (We’ll generally assume a dataset contains information collected from many individuals.) Another option is to allow only aggregate queries, such as an average over the data. Unfortunately, we now understand that neither approach actually provides strong privacy protection. De-identified datasets are subject to database-linkage attacks. Aggregation only protects privacy if the groups being aggregated are sufficiently large, and even then, privacy attacks are still possible [1, 2, 3, 4]. 

Differential Privacy

Differential privacy [5, 6] is a mathematical definition of what it means to have privacy. It is not a specific process like de-identification, but a property that a process can have. For example, it is possible to prove that a specific algorithm “satisfies” differential privacy.

Informally, differential privacy guarantees the following for each individual who contributes data for analysis: the output of a differentially private analysis will be roughly the same, whether or not you contribute your data. A differentially private analysis is often called a mechanism, and we denote it ℳ.

Figure 1: Informal Definition of Differential Privacy
Figure 1: Informal Definition of Differential Privacy

Figure 1 illustrates this principle. Answer “A” is computed without Joe’s data, while answer “B” is computed with Joe’s data. Differential privacy says that the two answers should be indistinguishable. This implies that whoever sees the output won’t be able to tell whether or not Joe’s data was used, or what Joe’s data contained.

We control the strength of the privacy guarantee by tuning the privacy parameter ε, also called a privacy loss or privacy budget. The lower the value of the ε parameter, the more indistinguishable the results, and therefore the more each individual’s data is protected.

Figure 2: Formal Definition of Differential Privacy
Figure 2: Formal Definition of Differential Privacy

We can often answer a query with differential privacy by adding some random noise to the query’s answer. The challenge lies in determining where to add the noise and how much to add. One of the most commonly used mechanisms for adding noise is the Laplace mechanism [5, 7]. 

Queries with higher sensitivity require adding more noise in order to satisfy a particular `epsilon` quantity of differential privacy, and this extra noise has the potential to make results less useful. We will describe sensitivity and this tradeoff between privacy and usefulness in more detail in future blog posts….(More)”.

Differential Privacy for Privacy-Preserving Data Analysis

Paper by Robert M Gonzalez, Matthew Harvey and Foteini Tzachrista: “Empirical evidence on the effectiveness of grassroots monitoring is mixed. This paper proposes a previously unexplored mechanism that may explain this result. We argue that the presence of credible and effective top-down monitoring alternatives can undermine citizen participation in grassroots monitoring efforts. Building on Olken’s (2009) road-building field experiment in Indonesia; we find a large and robust effect of the participation interventions on missing expenditures in villages without an audit in place. However, this effect vanishes as soon as an audit is simultaneously implemented in the village. We find evidence of crowding-out effects: in government audit villages, individuals are less likely to attend, talk, and actively participate in accountability meetings. They are also significantly less likely to voice general problems, corruption-related problems, and to take serious actions to address these problems. Despite policies promoting joint implementation of top-down and bottom-up interventions, this paper shows that top-down monitoring can undermine rather than complement grassroots efforts….(More)”.

Monitoring Corruption: Can Top-down Monitoring Crowd-Out Grassroots Participation?

Essay by Benjamin Kumpf: “…Here are some of the relevant trade-offs I identified. 

Rigour vs. Speed

How to best balance high-quality rigorous research and the need to gain actionable insights rapidly?  

Responding to a pandemic requires working at pace, while investing in ongoing research and the cross-fertilization of disciplines. In our response, we witness the importance of strong networks with academia and DFID’s focus on high-quality research. In parallel, we invest in supporting partners with rapid data collection through methods such as phone surveys, field visits, onsite interviews where possible as well as big data analysis and more. For example, through the International Growth Centre, DFID has supported a Sierra Leone COVID-19 dashboard, providing real time data on current economic conditions and trends from phone–based surveys from 195 towns and villages across Sierra Leone. ….

Breadth vs. depth

How to best balance providing services to large proportions of populations in need, while addressing challenges of specific communities?  

We are seeing emerging evidence that the virus and measures to prevent spread are disproportionately impacting marginalized communities and minorities. For example, in indigenous people are disproportionally affected by the virus in Brazil, Dalits are among the worst affected in India. In development and humanitarian contexts, it is paramount to guide innovation efforts with explicit values, including on the trade-off between scale and addressing last-mile challenges to leaveno–one behind. For example, to facilitate behaviour-change and embed insights from behavioural science and adaptive practices, DFID is supporting the Hygiene Hub, hosted at the London School for Hygiene and Tropical Medicine. The Hub provides free-of-charge advisory services to governments and non-governmental organizations working on COVID-19 related challenges in low and medium-income countries, balancing the need to reach large audiences and to design bespoke interventions for specific communities.  

Exploration vs. adaptation

How to best diversify innovation efforts and investments betweensearching for local solution and adapting proven approaches? 

Adaptive vs. locally-led

How to best learn and adapt, while providing ownership to local players?

Single-point solutions vs. systems-practices

How to advance specific tech and non-tech innovations that address urgent needs, while further improving existing systems? 

Supporting domestic innovators vs. strengthening local solutions and ecosystems

We need explicit conversations to ensure better transparency about this trade-off in innovation investments generally.…(More)”.

Trade-offs and considerations for the future: Innovation and the COVID-19 response

Chas Kissick, Elliot Setzer, and Jacob Schulz at Lawfare: “In May of this year, Prime Minister Boris Johnson pledged the United Kingdom would develop a “world beating” track and trace system by June 1 to stop the spread of the novel coronavirus. But on June 18, the government quietly abandoned its coronavirus contact-tracing app, a key piece of the “world beating” strategy, and instead promised to switch to a model designed by Apple and Google. The delayed app will not be ready until winter, and the U.K.’s Junior Health Minister told reporters that “it isn’t a priority for us at the moment.” When Johnson came under fire in Parliament for the abrupt U-turn, he replied: “I wonder whether the right honorable and learned Gentleman can name a single country in the world that has a functional contact tracing app—there isn’t one.”

Johnson’s rebuttal is perhaps a bit reductive, but he’s not that far off.

You probably remember the idea of contact-tracing apps: the technological intervention that seemed to have the potential to save lives while enabling a hamstrung economy to safely inch back open; it was a fixation of many public health and privacy advocates; it was the thing that was going to help us get out of this mess if we could manage the risks.

Yet nearly three months after Google and Apple announced with great fanfare their partnership to build a contact-tracing API, contact-tracing apps have made an unceremonious exit from the front pages of American newspapers. Countries, states and localities continue to try to develop effective digital tracing strategies. But as Jonathan Zittrain puts it, the “bigger picture momentum appears to have waned.”

What’s behind contact-tracing apps’ departure from the spotlight? For one, there’s the onset of a larger pandemic apathy in the U.S; many politicians and Americans seem to have thrown up their hands or put all their hopes in the speedy development of a vaccine. Yet, the apps haven’t even made much of a splash in countries that havetaken the pandemic more seriously. Anxieties about privacy persist. But technical shortcomings in the apps deserve the lion’s share of the blame. Countries have struggled to get bespoke apps developed by government technicians to work on Apple phones. The functionality of some Bluetooth-enabled models vary widely depending on small changes in phone positioning. And most countries have only convinced a small fraction of their populace to use national tracing apps.

Maybe it’s still possible that contact-tracing apps will make a miraculous comeback and approach the level of efficacy observers once anticipated.

But even if technical issues implausibly subside, the apps are operating in a world of unknowns.

Most centrally, researchers still have no real idea what level of adoption is required for the apps to actually serve their function. Some estimates suggest that 80 percent of current smartphone owners in a given area would need to use an app and follow its recommendations for digital contact tracing to be effective. But other researchers have noted that the apps could slow the rate of infections even if little more than 10 percent of a population used a tracing app. It will be an uphill battle even to hit the 10 percent mark in America, though. Survey data show that fewer than three in 10 Americans intend to use contact-tracing apps if they become available…(More).

What Ever Happened to Digital Contact Tracing?

Essay by Laura Robinson et al in FirstMonday: “Marking the 25th anniversary of the “digital divide,” we continue our metaphor of the digital inequality stack by mapping out the rapidly evolving nature of digital inequality using a broad lens. We tackle complex, and often unseen, inequalities spawned by the platform economy, automation, big data, algorithms, cybercrime, cybersafety, gaming, emotional well-being, assistive technologies, civic engagement, and mobility. These inequalities are woven throughout the digital inequality stack in many ways including differentiated access, use, consumption, literacies, skills, and production. While many users are competent prosumers who nimbly work within different layers of the stack, very few individuals are “full stack engineers” able to create or recreate digital devices, networks, and software platforms as pure producers. This new frontier of digital inequalities further differentiates digitally skilled creators from mere users. Therefore, we document emergent forms of inequality that radically diminish individuals’ agency and augment the power of technology creators, big tech, and other already powerful social actors whose dominance is increasing….(More)”

Digital inequalities 3.0: Emergent inequalities in the information age

The Economist: “The global downturn of 2020 is probably the most quantified on record. Economists, firms and statisticians seeking to gauge the depth of the collapse in economic activity and the pace of the recovery have seized upon a new dashboard of previously obscure indicators. Investors eagerly await the release of mobility statistics from tech companies such as Apple or Google, or restaurant-booking data from OpenTable, in a manner once reserved for official inflation and unemployment estimates. Central bankers pepper their speeches with novel barometers of consumer spending. Investment-bank analysts and journalists tout hot new measures of economic activity in the way that hipsters discuss the latest bands. Those who prefer to wait for official measures are regarded as being like fans of u2, a sanctimonious Irish rock group: stuck behind the curve as the rest of the world has moved on.

The main attraction of real-time data to policymakers and investors alike is timeliness. Whereas official, so-called hard data, such as inflation, employment or output measures, tend to be released with a lag of several weeks, or even months, real-time data, as the name suggests, can offer a window on today’s economic conditions. The depth of the downturns induced by covid-19 has put a premium on swift intelligence. The case for hard data has always been their quality, but this has suffered greatly during the pandemic. Compilers of official labour-market figures have struggled to account for furlough schemes and the like, and have plastered their releases with warnings about unusually high levels of uncertainty. Filling in statisticians’ forms has probably fallen to the bottom of firms’ to-do lists, reducing the accuracy of official output measures….

The value of real-time measures will be tested once the swings in economic activity approach a more normal magnitude. Mobility figures for March and April did predict the scale of the collapse in gdp, but that could have been estimated just as easily by stepping outside and looking around (at least in the places where that sort of thing was allowed during lockdown). Forecasters in rich countries are more used to quibbling over whether economies will grow at an annual rate of 2% or 3% than whether output will shrink by 20% or 30% in a quarter. Real-time measures have disappointed before. Immediately after Britain’s vote to leave the European Union in 2016, for instance, the indicators then watched by economists pointed to a sharp slowdown. It never came.

Real-time data, when used with care, have been a helpful supplement to official measures so far this year. With any luck the best of the new indicators will help official statisticians improve the quality and timeliness of their own figures. But, much like u2, the official measures have been around for a long time thanks to their tried and tested formula—and they are likely to stick around for a long time to come….(More)”.

Why real-time economic data need to be treated with caution

Blog by Alexandra Shaw, Andrew J. Zahuranec, Andrew Young, Stefaan G. Verhulst, Jennifer Requejo, Liliana Carvajal: “Adolescence is a unique stage of life. The brain undergoes rapid development; individuals face new experiences, relationships, and environments. These events can be exciting, but they can also be a source of instability and hardship. Half of all mental conditions manifest by early adolescence and between 10 and 20 percent of all children and adolescents report mental health conditions. Despite the increased risks and concerns for adolescents’ well-being, there remain significant gaps in availability of data at the country level for policymakers to address these issues.

In June, The GovLab partnered with colleagues at UNICEF’s Health and HIV team in the Division of Data, Analysis, Planning & Monitoring and the Data for Children Collaborative — a collaboration between UNICEF, the Scottish Government, and the University of Edinburgh — to design and apply a new methodology of participatory mapping and prioritization of key topics and issues associated with adolescent mental health that could be addressed through enhanced data collaboration….

The event led to three main takeaways. First, the topic mapping allows participants to deliberate and prioritize the various aspects of adolescent mental health in a more holistic manner. Unlike the “blind men and the elephant” parable, a topic map allows the participants to see and discuss  the interrelated parts of the topic, including those which they might be less familiar with.

Second, the workshops demonstrated the importance of tapping into distributed expertise via participatory processes. While the topic map provided a starting point, the inclusion of various experts allowed the findings of the document to be reviewed in a rapid, legitimate fashion. The diverse inputs helped ensure the individual aspects could be prioritized without a perspective being ignored.

Lastly, the approach showed the importance of data initiatives being driven and validated by those individuals representing the demand. By soliciting the input of those who would actually use the data, the methodology ensured data initiatives focused on the aspects thought to be most relevant and of greatest importance….(More)”

Adolescent Mental Health: Using A Participatory Mapping Methodology to Identify Key Priorities for Data Collaboration

Paper by Eric Windholz: “Emergencies require governments to govern differently. In Australia, the changes wrought by the COVID-19 pandemic have been profound. The role of lawmaker has been assumed by the executive exercising broad emergency powers. Parliaments, and the debate and scrutiny they provide, have been marginalised. The COVID-19 response also has seen the medical-scientific expert metamorphose from decision-making input into decision-maker. Extensive legislative and executive decision-making authority has been delegated to them – directly in some jurisdictions; indirectly in others. Severe restrictions on an individual’s freedom of movement, association and to earn a livelihood have been declared by them, or on their advice. Employing the analytical lens of regulatory legitimacy, this article examines and seeks to understand this shift from parliamentary sovereignty to autocratic technocracy. How has it occurred? Why has it occurred? What have been the consequences and risks of vesting significant legislative and executive power in the hands of medical-scientific experts; what might be its implications? The article concludes by distilling insights to inform the future design and deployment of public health emergency powers….(More)”.

Governing in a pandemic: from parliamentary sovereignty to autocratic technocracy

Centre for Data Ethics and Innovation: “Data sharing is fundamental to effective government and the running of public services. But it is not an end in itself. Data needs to be shared to drive improvements in service delivery and benefit citizens. For this to happen sustainably and effectively, public trust in the way data is shared and used is vital. Without such trust, the government and wider public sector risks losing society’s consent, setting back innovation as well as the smooth running of public services. Maximising the benefits of data driven technology therefore requires a solid foundation of societal approval.

AI and data driven technology offer extraordinary potential to improve decision making and service delivery in the public sector – from improved diagnostics to more efficient infrastructure and personalised public services. This makes effective use of data more important than it has ever been, and requires a step-change in the way data is shared and used. Yet sharing more data also poses risks and challenges to current governance arrangements.

The only way to build trust sustainably is to operate in a trustworthy way. Without adequate safeguards the collection and use of personal data risks changing power relationships between the citizen and the state. Insights derived by big data and the matching of different data sets can also undermine individual privacy or personal autonomy. Trade-offs are required which reflect democratic values, wider public acceptability and a shared vision of a data driven society. CDEI has a key role to play in exploring this challenge and setting out how it can be addressed. This report identifies barriers to data sharing, but focuses on building and sustaining the public trust which is vital if society is to maximise the benefits of data driven technology.

There are many areas where the sharing of anonymised and identifiable personal data by the public sector already improves services, prevents harm, and benefits the public. Over the last 20 years, different governments have adopted various measures to increase data sharing, including creating new legal sharing gateways. However, despite efforts to increase the amount of data sharing across the government, and significant successes in areas like open data, data sharing continues to be challenging and resource-intensive. This report identifies a range of technical, legal and cultural barriers that can inhibit data sharing.

Barriers to data sharing in the public sector

Technical barriers include limited adoption of common data standards and inconsistent security requirements across the public sector. Such inconsistency can prevent data sharing, or increase the cost and time for organisations to finalise data sharing agreements.

While there are often pre-existing legal gateways for data sharing, underpinned by data protection legislation, there is still a large amount of legal confusion on the part of public sector bodies wishing to share data which can cause them to start from scratch when determining legality and commit significant resources to legal advice. It is not unusual for the development of data sharing agreements to delay the projects for which the data is intended. While the legal scrutiny of data sharing arrangements is an important part of governance, improving the efficiency of these processes – without sacrificing their rigour – would allow data to be shared more quickly and at less expense.

Even when legal, the permissive nature of many legal gateways means significant cultural and organisational barriers to data sharing remain. Individual departments and agencies decide whether or not to share the data they hold and may be overly risk averse. Data sharing may not be prioritised by a department if it would require them to bear costs to deliver benefits that accrue elsewhere (i.e. to those gaining access to the data). Departments sharing data may need to invest significant resources to do so, as well as considering potential reputational or legal risks. This may hold up progress towards finding common agreement on data sharing. When there is an absence of incentives, even relatively small obstacles may mean data sharing is not deemed worthwhile by those who hold the data – despite the fact that other parts of the public sector might benefit significantly….(More)”.

Addressing trust in public sector data use

Get the latest news right in your inbox

Subscribe to curated findings and actionable knowledge from The Living Library, delivered to your inbox every Friday