The Stanford Open Policing Project


About: “On a typical day in the United States, police officers make more than 50,000 traffic stops. Our team is gathering, analyzing, and releasing records from millions of traffic stops by law enforcement agencies across the country. Our goal is to help researchers, journalists, and policymakers investigate and improve interactions between police and the public.

Currently, a comprehensive, national repository detailing interactions between police and the public doesn’t exist. That’s why the Stanford Open Policing Project is collecting and standardizing data on vehicle and pedestrian stops from law enforcement departments across the country — and we’re making that information freely available. We’ve already gathered 130 million records from 31 state police agencies and have begun collecting data on stops from law enforcement agencies in major cities, as well.

We, the Stanford Open Policing Project, are an interdisciplinary team of researchers and journalists at Stanford University. We are committed to combining the academic rigor of statistical analysis with the explanatory power of data journalism….(More)”.

Algorithmic fairness: A code-based primer for public-sector data scientists


Paper by Ken Steif and Sydney Goldstein: “As the number of government algorithms grow, so does the need to evaluate algorithmic fairness. This paper has three goals. First, we ground the notion of algorithmic fairness in the context of disparate impact, arguing that for an algorithm to be fair, its predictions must generalize across different protected groups. Next, two algorithmic use cases are presented with code examples for how to evaluate fairness. Finally, we promote the concept of an open source repository of government algorithmic “scorecards,” allowing stakeholders to compare across algorithms and use cases….(More)”.

Opening the Government of Canada The Federal Bureaucracy in the Digital Age


Book by Amanda Clarke: “In the digital age, governments face growing calls to become more open, collaborative, and networked. But can bureaucracies abandon their closed-by-design mindsets and operations and, more importantly, should they?

Opening the Government of Canada presents a compelling case for the importance of a more open model of governance in the digital age – but a model that continues to uphold traditional democratic principles at the heart of the Westminster system. Drawing on interviews with public officials and extensive analysis of government documents and social media accounts, Clarke details the untold story of the Canadian federal bureaucracy’s efforts to adapt to new digital pressures from the mid-2000s onward. This book argues that the bureaucracy’s tradition of closed government, fuelled by today’s antagonistic political communications culture, is at odds with evolving citizen expectations and new digital policy tools, including social media, crowdsourcing, and open data. Amanda Clarke also cautions that traditional democratic principles and practices essential to resilient governance must not be abandoned in the digital age, which may justify a more restrained opening of our governing institutions than is currently proposed by many academics and governments alike.

Striking a balance between reform and tradition, Opening the Government of Canada concludes with a series of pragmatic recommendations that lay out a roadmap for building a democratically robust, digital-era federal government….(More)”.

Making Policy in a Complex World


Book by Paul Cairney, Tanya Heikkila and Matthew Wood: “This provocative Element is on the ‘state of the art’ of theories that highlight policymaking complexity. It explains complexity in a way that is simple enough to understand and use. The primary audience is policy scholars seeking a single authoritative guide to studies of ‘multi-centric policymaking’. It synthesises this literature to build a research agenda on the following questions: 1 . How can we best explain the ways in which many policymaking ‘centres‘ interact to produce policy? 2. How should we research multi-centric policymaking? 3. How can we hold policymakers to account in a multi-centric system? 4. How can people engage effectively to influence policy in a multi-centric system? However, by focusing on simple exposition and limiting jargon, Paul Cairney, Tanya Heikkila, Matthew Wood also speak to a far wider audience of practitioners, students, and new researchers seeking a straightforward introduction to policy theory and its practical lessons….(More)”.

Are Requirements to Deposit Data in Research Repositories Compatible With the European Union’s General Data Protection Regulation?


Paper by Deborah Mascalzoni et al: “To reproduce study findings and facilitate new discoveries, many funding bodies, publishers, and professional communities are encouraging—and increasingly requiring—investigators to deposit their data, including individual-level health information, in research repositories. For example, in some cases the National Institutes of Health (NIH) and editors of some Springer Nature journals require investigators to deposit individual-level health data via a publicly accessible repository (12). However, this requirement may conflict with the core privacy principles of European Union (EU) General Data Protection Regulation 2016/679 (GDPR), which focuses on the rights of individuals as well as researchers’ obligations regarding transparency and accountability.

The GDPR establishes legally binding rules for processing personal data in the EU, as well as outside the EU in some cases. Researchers in the EU, and often their global collaborators, must comply with the regulation. Health and genetic data are considered special categories of personal data and are subject to relatively stringent rules for processing….(More)”.

Using Data Sharing Agreements as Tools of Indigenous Data Governance: Current Uses and Future Options


Paper by Martinez, A. and Rainie, S. C.: “Indigenous communities and scholars have been influencing a shift in participation and inclusion in academic and agency research over the past two decades. As a response, Indigenous peoples are increasingly asking research questions and developing their own studies rooted in their cultural values. They use the study results to rebuild their communities and to protect their lands. This process of Indigenous-driven research has led to partnering with academic institutions, establishing research review boards, and entering into data sharing agreements to protect environmental data, community information, and local and traditional knowledges.

Data sharing agreements provide insight into how Indigenous nations are addressing the key areas of data collection, ownership, application, storage, and the potential for data reuse in the future. By understanding this mainstream data governance mechanism, how they have been applied, and how they have been used in the past, we aim to describe how Indigenous nations and communities negotiate data protection and control with researchers.

The project described here reviewed publicly available data sharing agreements that focus on research with Indigenous nations and communities in the United States. We utilized qualitative analysis methods to identify specific areas of focus in the data sharing agreements, whether or not traditional or cultural values were included in the language of the data sharing agreements, and how the agreements defined data. The results detail how Indigenous peoples currently use data sharing agreements and potential areas of expansion for language to include in data sharing agreements as Indigenous peoples address the research needs of their communities and the protection of community and cultural data….(More)”.

The new ecosystem of trust: How data trusts, collaboratives and coops can help govern data for the maximum public benefit


Paper by Geoff Mulgan and Vincent Straub: The world is struggling to govern data. The challenge is to reduce abuses of all kinds, enhance accountability and improve ethical standards, while also ensuring that the maximum public and private value can also be derived from data.

Despite many predictions to the contrary the world of commercial data is dominated by powerful organisations. By contrast, there are few institutions to protect the public interest and those that do exist remain relatively weak. This paper argues that new institutions—an ecosystem of trust—are needed to ensure that uses of data are trusted and trustworthy. It advocates the creation of different kinds of data trust to fill this gap. It argues:

  • That we need, but currently lack, institutions that are good at thinking through, discussing, and explaining the often complex trade-offs that need to be made about data.
  • That the task of creating trust is different in different fields. Overly generic solutions will be likely to fail.
  • That trusts need to be accountable—in some cases to individual members where there is a direct relationship with individuals giving consent, in other cases to the broader public.
  • That we should expect a variety of types of data trust to form—some sharing data; some managing synthetic data; some providing a research capability; some using commercial data and so on. The best analogy is finance which over time has developed a very wide range of types of institution and governance.

This paper builds on a series of Nesta think pieces on data and knowledge commons published over the last decade and current practical projects that explore how data can be mobilised to improve healthcarepolicing, the jobs market and education. It aims to provide a framework for designing a new family of institutions under the umbrella title of data trusts, tailored to different conditions of consent, and different patterns of private and public value. It draws on the work of many others (including the work of GovLab and the Open Data Institute).

Introduction

The governance of personal data of all kinds has recently moved from being a very marginal specialist issue to one of general concern. Too much data has been misused, lost, shared, sold or combined with little involvement of the people most affected, and little ethical awareness on the part of the organisations in charge.

The most visible responses have been general ones—like the EU’s GDPR. But these now need to be complemented by new institutions that can be generically described as ‘data trusts’.

In current practice the term ‘trust’ is used to describe a very wide range of institutions. These include private trusts, a type of legal structure that holds and makes decisions about assets, such as property or investments, and involves trustors, trustees, and beneficiaries. There are also public trusts in fields like education with a duty to provide a public benefit. Examples include the Nesta Trust and the National Trust. There are trusts in business (e.g. to manage pension funds). And there are trusts in the public sector, such as the BBC Trust and NHS Foundation Trusts with remits to protect the public interest, at arms length from political decisions.

It’s now over a decade since the first data trusts were set up as private initiatives in response to anxieties about abuse. These were important pioneers though none achieved much scale or traction.

Now a great deal of work is underway around the world to consider what other types of trust might be relevant to data, so as to fill the governance vacuum—handling everything from transport data to personalised health, the internet of things to school records, and recognising the very different uses of data—by the state for taxation or criminal justice etc.; by academia for research; by business for use and resale; and to guide individual choices. This paper aims to feed into that debate.

1. The twin problems: trust and value

Two main clusters of problem are coming to prominence. The first cluster of problems involve misuseand overuse of data; the second set of problems involves underuse of data.

1.1. Lack of control fuels distrust

The first problem is a lack of control and agency—individuals feel unable to control data about their own lives (from Facebook links and Google searches to retail behaviour and health) and communities are unable to control their own public data (as in Sidewalk labs and other smart city projects that attempted to privatise public data). Lack of control leads to the risk of abuses of privacy, and a wider problem of decreasing trust—which survey evidence from the Open Data Institute (ODI) shows is key in determining the likelihood consumers will share their personal data (although this varies across countries). The lack of transparency regarding how personal data is then used to train algorithms making decisions only adds to the mistrust.

1.2 Lack of trust leads to a deficit of public value

The second, mirror cluster of problems concern value. Flows of data promise a lot: better ways to assess problems, understand options, and make decisions. But current arrangements make it hard for individuals to realise the greatest value from their own data, and they make it even harder for communities to safely and effectively aggregate, analyse and link data to solve pressing problems, from health and crime to mobility. This is despite the fact that many consumers are prepared to make trade-offs: to share data if it benefits themselves and others—a 2018 Nesta poll found, for example, that 73 per cent of people said they would share their personal data in an effort to improve public services if there was a simple and secure way of doing it. A key reason for the failure to maximise public value is the lack of institutions that are sufficiently trusted to make judgements in the public interest.

Attempts to answer these problems sometimes point in opposite directions—the one towards less free flow, less linking of data, the other towards more linking and combination. But any credible policy responses have to address both simultaneously.

2. The current landscape

The governance field was largely empty earlier this decade. It is now full of activity, albeit at an early stage. Some is legislative—like GDPR and equivalents being considered around the world. Some is about standards—like Verify, IHAN and other standards intended to handle secure identity. Some is more entrepreneurial—like the many Personal Data Stores launched over the last decade, from Mydexto SOLID, Citizen-me to digi.me. Some are experiments like the newly launched Amsterdam Data Exchange (Amdex) and the UK government’s recently announced efforts to fund data trust pilots to tackle wildlife conservation, working with the ODI. Finally, we are now beginning to see new institutions within government to guide and shape activity, notably the new Centre for Data Ethics and Innovation.

Many organisations have done pioneering work, including the ODI in the UK and NYU GovLab with its work on data collaboratives. At Nesta, as part of the Europe-wide DECODE consortium, we are helping to develop new tools to give people control of their personal data while the Next Generation Internet (NGI) initiative is focused on creating a more inclusive, human-centric and resilient internet—with transparency and privacy as two of the guiding pillars.

The task of governing data better brings together many elements, from law and regulation to ethics and standards. We are just beginning to see more serious discussion about tax and data—from the proposals to tax digital platforms turnover to more targeted taxes of data harvesting in public places or infrastructures—and more serious debate around regulation. This paper deals with just one part of this broader picture: the role of institutions dedicated to curating data in the public interest….(More)”.

Whose Rules? The Quest for Digital Standards


Stephanie Segal at CSIS: “Prime Minister Shinzo Abe of Japan made news at the World Economic Forum in Davos last month when he announced Japan’s aspiration to make the G20 summit in Osaka a launch pad for “world-wide data governance.” This is not the first time in recent memory that Japan has taken a leadership role on an issue of keen economic importance. Most notably, the Trans-Pacific Partnership (TPP) lives on as the Comprehensive and Progressive Agreement on Trans-Pacific Partnership (CPTPP), thanks in large part to Japan’s efforts to keep the trading bloc together after President Trump announced U.S. withdrawal from the TPP. But it’s in the area of data and digital governance that Japan’s efforts will perhaps be most consequential for future economic growth.

Data has famously been called “the new oil” in the global economy. A 2016 report by the McKinsey Global Institute estimated that global data flows contributed $2.8 trillion in value to the global economy back in 2014, while cross-border data flows and digital trade continue to be key drivers of global trade and economic growth. Japan’s focus on data and digital governance is therefore consistent with its recent efforts to support global growth, deepen global trade linkages, and advance regional and global standards.

Data governance refers to the rules directing the collection, processing, storage, and use of data. The proliferation of smart devices and the emergence of a data-driven Internet of Things portends an exponential growth in digital data. At the same time, recent reporting on overly aggressive commercial practices of personal data collection, as well as the separate topic of illegal data breaches, have elevated public awareness and interest in the laws and policies that govern the treatment of data, and personal data in particular. Finally, a growing appreciation of data’s central role in driving innovation and future technological and economic leadership is generating concern in many capitals that different data and digital governance standards and regimes will convey a competitive (dis)advantage to certain countries.

Bringing these various threads together—the inevitable explosion of digital data; the need to protect an individual’s right to privacy; and the appreciation that data has economic value and conveys economic advantage—is precisely why Japan’s initiative is both timely and likely to face significant challenges….(More)”.

State Capability, Policymaking and the Fourth Industrial Revolution


Demos Helsinki: “The world as we know it is built on the structures of the industrial era – and these structures are falling apart. Yet the vision of a new, sustainable and fair post-industrial society remains unclear. This discussion paper is the result of a collaboration between a group of organisations interested in the implications of the rapid technological development to policymaking processes and knowledge systems that inform policy decisions.

In the discussion paper, we set out to explore what the main opportunities and concerns that accompany the Fourth Industrial Revolution for policymaking and knowledge systems are particularly in middle-income countries. Overall, middle-income countries are home to five billion of the world’s seven billion people and 73 per cent of the world’s poor people; they represent about one-third of the global Gross Domestic Product (GDP) and are major engines of global growth (World Bank 2018).

The paper is co-produced with Capability (Finland), Demos Helsinki (Finland), HELVETAS Swiss Intercooperation (Switzerland), Politics & Ideas (global), Southern Voice (global), UNESCO Montevideo (Uruguay) and Using Evidence (Canada).

The guiding questions for this paper are:

– What are the critical elements of the Fourth Industrial Revolution?

– What does the literature say about the impact of this revolution on societies and economies, and in particular on middle-income countries?

– What are the implications of the Fourth Industrial Revolution for the achievement of the Sustainable Development Goals (SDGs) in middle-income countries?

– What does the literature say about the challenges for governance and the ways knowledge can inform policy during the Fourth Industrial Revolution?…(More)”.

Full discussion paper“State Capability, Policymaking and the Fourth Industrial Revolution: Do Knowledge Systems Matter?”

The privacy threat posed by detailed census data


Gillian Tett at the Financial Times: “Wilbur Ross suffered the political equivalent of a small(ish) black eye last month: a federal judge blocked the US commerce secretary’s attempts to insert a question about citizenship into the 2020 census and accused him of committing “egregious” legal violations.

The Supreme Court has agreed to hear the administration’s appeal in April. But while this high-profile fight unfolds, there is a second, less noticed, census issue about data privacy emerging that could have big implications for businesses (and citizens). Last weekend John Abowd, the Census Bureau’s chief scientist, told an academic gathering that statisticians had uncovered shortcomings in the protection of personal data in past censuses. There is no public evidence that anyone has actually used these weaknesses to hack records, and Mr Abowd insisted that the bureau is using cutting-edge tools to fight back. But, if nothing else, this revelation shows the mounting problem around data privacy. Or, as Mr Abowd, noted: “These developments are sobering to everyone.” These flaws are “not just a challenge for statistical agencies or internet giants,” he added, but affect any institution engaged in internet commerce and “bioinformatics”, as well as commercial lenders and non-profit survey groups. Bluntly, this includes most companies and banks.

The crucial problem revolves around what is known as “re-identification” risk. When companies and government institutions amass sensitive information about individuals, they typically protect privacy in two ways: they hide the full data set from outside eyes or they release it in an “anonymous” manner, stripped of identifying details. The census bureau does both: it is required by law to publish detailed data and protect confidentiality. Since 1990, it has tried to resolve these contradictory mandates by using “household-level swapping” — moving some households from one geographic location to another to generate enough uncertainty to prevent re-identification. This used to work. But today there are so many commercially-available data sets and computers are so powerful that it is possible to re-identify “anonymous” data by combining data sets. …

Thankfully, statisticians think there is a solution. The Census Bureau now plans to use a technique known as “differential privacy” which would introduce “noise” into the public statistics, using complex algorithms. This technique is expected to create just enough statistical fog to protect personal confidentiality in published data — while also preserving information in an encrypted form that statisticians can later unscramble, as needed. Companies such as Google, Microsoft and Apple have already used variants of this technique for several years, seemingly successfully. However, nobody has employed this system on the scale that the Census Bureau needs — or in relation to such a high stakes event. And the idea has sparked some controversy because some statisticians fear that even “differential privacy” tools can be hacked — and others fret it makes data too “noisy” to be useful….(More)”.