Scraping Court Records Data to Find Dirty Cops


Article by Lawsuit.org: “In the 2002 dystopian sci-fi film “Minority Report,” law enforcement can manage crime by “predicting” illegal behavior before it happens. While fiction, the plot is intriguing and contributes to the conversation on advanced crime-fighting technology. However, today’s world may not be far off.

Data’s role in our lives and more accessibility to artificial intelligence is changing the way we approach topics such as research, real estate, and law enforcement. In fact, recent investigative reporting has shown that “dozens of [American] cities” are now experimenting with predictive policing technology.

Despite the current controversy surrounding predictive policing, it seems to be a growing trend that has been met with little real resistance. We may be closer to policing that mirrors the frightening depictions in “Minority Report” than we ever thought possible. 

Fighting Fire With Fire

In its current state, predictive policing is defined as:

“The usage of mathematical, predictive analytics, and other analytical techniques in law enforcement to identify potential criminal activity. Predictive policing methods fall into four general categories: methods for predicting crimes, methods for predicting offenders, methods for predicting perpetrators’ identities, and methods for predicting victims of crime.”

While it might not be possible to prevent predictive policing from being employed by the criminal justice system, perhaps there are ways we can create a more level playing field: One where the powers of big data analysis aren’t just used to predict crime, but also are used to police law enforcement themselves.

Below, we’ve provided a detailed breakdown of what this potential reality could look like when applied to one South Florida county’s public databases, along with information on how citizens and communities can use public data to better understand the behaviors of local law enforcement and even individual police officers….(More)”.

Open Data from Authoritarian Regimes: New Opportunities, New Challenges


Paper by Ruth D. Carlitz and Rachael McLellan: “Data availability has long been a challenge for scholars of authoritarian politics. However, the promotion of open government data—through voluntary initiatives such as the Open Government Partnership and soft conditionalities tied to foreign aid—has motivated many of the world’s more closed regimes to produce and publish fine-grained data on public goods provision, taxation, and more. While this has been a boon to scholars of autocracies, we argue that the politics of data production and dissemination in these countries create new challenges.

Systematically missing or biased data may jeopardize research integrity and lead to false inferences. We provide evidence of such risks from Tanzania. The example also shows how data manipulation fits into the broader set of strategies that authoritarian leaders use to legitimate and prolong their rule. Comparing data released to the public on local tax revenues with verified internal figures, we find that the public data appear to significantly underestimate opposition performance. This can bias studies on local government capacity and risk parroting the party line in data form. We conclude by providing a framework that researchers can use to anticipate and detect manipulation in newly available data….(More)”.

EU Company Data: State of the Union 2020


Report by OpenCorporates: “… on access to company data in the EU. It’s completely revised, with more detail on the impact that the lack of access to this critical dataset has – on business, on innovation, on democracy, and society.

The results are still not great however:

  • Average score is low
    The average score across the EU in terms of access to company data is just 40 out of 100. This is better than the average score 8 years ago, which was just 23 out of 100, but still very low nevertheless.
  • Some major economies score badly
    Some of the EU’s major economies continue to score very badly indeed, with Germany, for example, scoring just 15/100, Italy 10/100, and Spain 0/100.
  • EU policies undermined
    The report identifies 15 areas where the lack of open company data frustrates, impedes or otherwise has a negative impact on EU policy.
  • Inequalities widened
    The report also identifies how inequalities are further widened by poor access to this critical dataset, and how the recovery from COVID-19 will be hampered by it too.

On the plus side, the report also identifies the EU Open Data & PSI Directive passed last year as potentially game changing – but only if it is implemented fully, and there are significant doubts whether this will happen….(More)”

Characterizing Disinformation Risk to Open Data in the Post-Truth Era


Paper by Adrienne Colborne and Michael Smit: “Curated, labeled, high-quality data is a valuable commodity for tasks such as business analytics and machine learning. Open data is a common source of such data—for example, retail analytics draws on open demographic data, and weather forecast systems draw on open atmospheric and ocean data. Open data is released openly by governments to achieve various objectives, such as transparency, informing citizen engagement, or supporting private enterprise.

Critical examination of ongoing social changes, including the post-truth phenomenon, suggests the quality, integrity, and authenticity of open data may be at risk. We introduce this risk through various lenses, describe some of the types of risk we expect using a threat model approach, identify approaches to mitigate each risk, and present real-world examples of cases where the risk has already caused harm. As an initial assessment of awareness of this disinformation risk, we compare our analysis to perspectives captured during open data stakeholder consultations in Canada…(More)”.

Why open science is critical to combatting COVID-19


Article by the OECD: “…In January 2020, 117 organisations – including journals, funding bodies, and centres for disease prevention – signed a statement titled “Sharing research data and findings relevant to the novel coronavirus outbreakcommitting to provide immediate open access for peer-reviewed publications at least for the duration of the outbreak, to make research findings available via preprint servers, and to share results immediately with the World Health Organization (WHO). This was followed in March by the Public Health Emergency COVID-19 Initiative, launched by 12 countries1 at the level of chief science advisors or equivalent, calling for open access to publications and machine-readable access to data related to COVID-19, which resulted in an even stronger commitment by publishers.

The Open COVID Pledge was launched in April 2020 by an international coalition of scientists, lawyers, and technology companies, and calls on authors to make all intellectual property (IP) under their control available, free of charge, and without encumbrances to help end the COVID-19 pandemic, and reduce the impact of the disease….

Remaining challenges

While clinical, epidemiological and laboratory data about COVID-19 is widely available, including genomic sequencing of the pathogen, a number of challenges remain:

  • All data is not sufficiently findable, accessible, interoperable and reusable (FAIR), or not yet FAIR data.
  • Sources of data tend to be dispersed, even though many pooling initiatives are under way, curation needs to be operated “on the fly”.
  • Providing access to personal health record sharing needs to be readily accessible, pending the patient’s consent. Legislation aimed at fostering interoperability and avoiding information blocking are yet to be passed in many OECD countries. Access across borders is even more difficult under current data protection frameworks in most OECD countries.
  • In order to achieve the dual objectives of respecting privacy while ensuring access to machine readable, interoperable and reusable clinical data, the Virus Outbreak Data Network (VODAN) proposes to create FAIR data repositories which could be used by incoming algorithms (virtual machines) to ask specific research questions.
  • In addition, many issues arise around the interpretation of data – this can be illustrated by the widely followed epidemiological statistics. Typically, the statistics concern “confirmed cases”, “deaths” and “recoveries”. Each of these items seem to be treated differently in different countries, and are sometimes subject to methodological changes within the same country.
  • Specific standards for COVID-19 data therefore need to be established, and this is one of the priorities of the UK COVID-19 Strategy. A working group within Research Data Alliance has been set up to propose such standards at an international level.
  • In some cases it could be inferred that the transparency of the statistics may have guided governments to restrict testing in order to limit the number of “confirmed cases” and avoid the rapid rise of numbers. Lower testing rates can in turn reduce the efficiency of quarantine measures, lowering the overall efficiency of combating the disease….(More)”.

Open science: after the COVID-19 pandemic there can be no return to closed working


Article by Virginia Barbour and Martin Borchert: “In the few months since the first case of COVID-19 was identified, the underlying cause has been isolated, its symptoms agreed on, its genome sequenced, diagnostic tests developed, and potential treatments and vaccines are on the horizon. The astonishingly short time frame of these discoveries has only happened through a global open science effort.

The principles and practices underpinning open science are what underpin good research—research that is reliable, reproducible, and has the broadest impact possible. It specifically requires the application of principles and practices that make research FAIR (Findable, Accessible, Interoperable, Reusable); researchers are making their data and preliminary publications openly accessible, and then publishers are making the peer-reviewed research immediately and freely available to all. The rapid dissemination of research—through preprints in particular as well as journal articles—stands in contrast to what happened in the 2003 SARS outbreak when the majority of research on the disease was published well after the outbreak had ended.

Many outside observers might reasonably assume, given the digital world we all now inhabit, that science usually works like this. Yet this is very far from the norm for most research. Science is not something that just happens in response to emergencies or specific events—it is an ongoing, largely publicly funded, national and international enterprise….

Sharing of the underlying data that journal articles are based on is not yet a universal requirement for publication, nor are researchers usually recognised for data sharing.

There are many benefits associated with an open science model. Image adapted from: Gaelen Pinnock/UCT; CC-BY-SA 4.0 .

Once published, even access to research is not seamless. The majority of academic journals still require a subscription to access. Subscriptions are expensive; Australian universities alone currently spend more than $300 million per year on subscriptions to academic journals. Access to academic journals also varies between universities with varying library budgets. The main markets for subscriptions to the commercial journal literature are higher education and health, with some access to government and commercial….(More)”.

How Statistics Can Help — Going Beyond COVID-19


Blog by Walter J. Radermacher at Data & Policy: “It is rightly pointed out that in the midst of a crisis of enormous dimensions we needed high quality statistics with utmost urgency, but that instead we are in danger of drowning in an ocean of data and information. The pandemic is accompanied and exacerbated by an infodemic. At this moment, and in this confusion and search for solutions, it seems appropriate to take advice from previous initiatives and draw lessons for the current situation. More than 20 years ago in the United Kingdom, the report “Statistics — A Matter of Trust” laid the foundations for overcoming the previously spreading crisis of confidence through a solidly structured statistical system. This report does not stand alone in international comparison. Rather, it is one of a series of global, European and national measures and agreements which, since the fall of the Berlin Wall in 1989, have strengthened official statistics as the backbone of policy in democratic societies, with the UN Fundamental Statistical Principles and the EU Statistics Code of Practice being prominent representatives. So, if we want to deal with our current difficulties, we should address precisely those points that have emerged as determining factors for the quality of statistics, with the following three questions: What (statistical products, quality profile)? How (methods)? Who (institutions)? The aim must be to ensure that statistical information is suitable for facilitating the resolution of conflicts by eliminating the need to argue about the facts and only about the conclusions to be drawn from them.

In the past, this task would have led relatively quickly to a situation where the need for information would have been directed to official statistics as the preferred provider; this has changed recently for many reasons. On the one hand, there is the danger that the much-cited data revolution and learning algorithms (so-called AI) are presented as an alternative to official statistics (which are perceived as too slow, too inflexible and too expensive), instead of emphasizing possible commonalities and cross-fertilization possibilities. On the other hand, after decades of austerity policies, official statistics are in a similarly defensive situation to that of the public health system in many respects and in many countries: There is a lack of financial reserves, personnel and know-how for the new and innovative work now so urgently needed.

It is therefore required, as in the 1990s, to ask the fundamental question again, namely, do we (still and again) really deserve official statistics as the backbone of democratic decision-making, and if so, what should their tasks be, how should they be financed and anchored in the political system?…(More)”.

Protecting Data Privacy and Rights During a Crisis are Key to Helping the Most Vulnerable in Our Community


Blog by Amen Ra Mashariki: “Governments should protect the data and privacy rights of their communities even during emergencies. It is a false trade-off to require more data without protection. We can and should do both — collect the appropriate data and protect it. Establishing and protecting the data rights and privacy of our communities’ underserved, underrepresented, disabled, and vulnerable residents is the only way we can combat the negative impact of COVID-19 or any other crisis.

Building trust is critical. Governments can strengthen data privacy protocols, beef up transparency mechanisms, and protect the public’s data rights in the name of building trust — especially with the most vulnerable populations. Otherwise, residents will opt out of engaging with government, and without their information, leaders like first responders will be blind to their existence when making decisions and responding to emergencies, as we are seeing with COVID-19.

As Chief Analytics Officer of New York City, I often remembered the words of Defense Secretary Donald Rumsfeld, especially with regards to using data during emergencies, that there are “known knowns, known unknowns, and unknown unknowns, and we will always get hurt by the unknown unknowns.” Meaning the things we didn’t know — the data that we didn’t have — was always going to be what hurt us during times of emergencies….

There are three key steps that governments can do right now to use data most effectively to respond to emergencies — both for COVID-19 and in the future.

Seek Open Data First

In times of crisis and emergencies, many believe that government and private entities, either purposefully or inadvertently, are willing to trample on the data rights of the public in the name of appropriate crisis response. This should not be a trade-off. We can respond to crises while keeping data privacy and data rights in the forefront of our minds. Rather than dismissing data rights, governments can start using data that is already openly available. This seems like a simple step, but it does two very important things. First, it forces you to understand the data that is already available in your jurisdiction. Second, it grows your ability to fill the gaps with respect to what you know about the city by looking outside of city government. …(More)”.

Responsible Data Toolkit


Andrew Young at The GovLab: “The GovLab and UNICEF, as part of the Responsible Data for Children initiative (RD4C), are pleased to share a set of user-friendly tools to support organizations and practitioners seeking to operationalize the RD4C Principles. These principles—Purpose-Driven, People-Centric, Participatory, Protective of Children’s Rights, Proportional, Professionally Accountable, and Prevention of Harms Across the Data Lifecycle—are especially important in the current moment, as actors around the world are taking a data-driven approach to the fight against COVID-19.

The initial components of the RD4C Toolkit are:

The RD4C Data Ecosystem Mapping Tool intends to help users to identify the systems generating data about children and the key components of those systems. After using this tool, users will be positioned to understand the breadth of data they generate and hold about children; assess data systems’ redundancies or gaps; identify opportunities for responsible data use; and achieve other insights.

The RD4C Decision Provenance Mapping methodology provides a way for actors designing or assessing data investments for children to identify key decision points and determine which internal and external parties influence those decision points. This distillation can help users to pinpoint any gaps and develop strategies for improving decision-making processes and advancing more professionally accountable data practices.

The RD4C Opportunity and Risk Diagnostic provides organizations with a way to take stock of the RD4C principles and how they might be realized as an organization reviews a data project or system. The high-level questions and prompts below are intended to help users identify areas in need of attention and to strategize next steps for ensuring more responsible handling of data for and about children across their organization.

Finally, the Data for Children Collaborative with UNICEF developed an Ethical Assessment that “forms part of [their] safe data ecosystem, alongside data management and data protection policies and practices.” The tool reflects the RD4C Principles and aims to “provide an opportunity for project teams to reflect on the material consequences of their actions, and how their work will have real impacts on children’s lives.

RD4C launched in October 2019 with the release of the RD4C Synthesis ReportSelected Readings, and the RD4C Principles. Last month we published the The RD4C Case Studies, which analyze data systems deployed in diverse country environments, with a focus on their alignment with the RD4C Principles. The case studies are: Romania’s The Aurora ProjectChildline Kenya, and Afghanistan’s Nutrition Online Database.

To learn more about Responsible Data for Children, visit rd4c.org or contact rd4c [at] thegovlab.org. To join the RD4C conversation and be alerted to future releases, subscribe at this link.”

Coronavirus shows how badly we need consensus on collective data rights and needs


Blogpost by Ania Calderon: “The rapid spread of this disease is exposing fault lines in our political and social balance — most visibly in the lack of protection for the poorest or investment in healthcare systems. It’s also forcing us to think about how we can work across jurisdictions and political contexts to foster better collaboration, build trust in institutions, and save lives.

As we said recently in a call for Open COVID-19 Data, governments need data from other countries to model and flatten the curve, but there is little consistency in how they gather it. Meanwhile, the consequences of different approaches show the balance required in effectively implementing open data policies. For example, Singapore has published detailed personal data about every coronavirus patient, including where they work and live and whether they had contact with others. This helped the city-state keep its infection and death rates extremely low in the early stages of the epidemic, but also led to proportionality concerns as people might be targeted and harmed.

Overall, few governments are publishing the information on which they are basing these huge decisions. This makes it hard to collaborate, scrutinise, and build trust. For example, the models can only be as good as the data that feed them, and we need to understand their limitations. Opening up the data and the source code behind them would give citizens confidence that officials were making decisions in the public’s interest rather than their political ones. It would also foster the international joined-up action needed to meet this challenge. And it would allow non-state actors into the process to plug gaps and deliver and scale effective solutions quickly.

At the same time, legitimate concerns have been raised about how this data is used, both now and in the future.

As we say in our strategy, openness needs to be balanced with both individual and collective data rights, and policies need to account for context.

People may be ok to give up some of their privacy — like having their movements tracked by government smartphone apps — if that can help combat a global health crisis, but that would seem an unthinkable invasion of privacy to many in less exceptional times. We rightly worry how this data might be used later on, and by whom. Which shows that data systems need to be able to respond to changing times, while holding fundamental human rights and civil liberties in check.

As with so many things, this crisis is forcing the world to question orthodoxies around individual and collective data rights and needs. It shines a light on policies and approaches which might help avoid future disasters and build a fairer, healthier, more collaborative society overall….(More)”.