Counting Crimes: An Obsolete Paradigm

Paul Wormeli at The Criminologist: “To the extent that a paradigm is defined as the way we view things, the crime statistics paradigm in the United States is inadequate and requires reinvention….The statement—”not all crime is reported to the police”—lies at the very heart of why our current crime data are inherently incomplete. It is a direct reference to the fact that not all “street crime” is reported and that state and local law enforcement are not the only entities responsible for overseeing violations of societally established norms (“street crime” or otherwise). Two significant gaps exist, in that: 1) official reporting of crime from state and local law enforcement agencies cannot provide insight into unreported incidents, and 2) state and local law enforcement may not have or acknowledge jurisdiction over certain types of matters, such as cybercrime, corruption, environmental crime, or terrorism, and therefore cannot or do not report on those incidents…

All of these gaps in crime reporting mask the portrait of crime in the U.S. If there was a complete accounting of crime that could serve as the basis of policy formulation, including the distribution of federal funds to state and local agencies, there could be a substantial impact across the nation. Such a calculation would move the country toward a more rational basis for determining federal support for communities based on a comprehensive measure of community wellness.

In its deliberations, the NAS Panel recognized that it is essential to consider both the concepts of classification and the rules of counting as we seek a better and more practical path to describing crime in the U.S. and its consequences. The panel postulated that a meaningful classification of incidents found to be crimes would go beyond the traditional emphasis on street crime and include all crime categories.

The NAS study identified the missing elements of a national crime report as including more complete data on crimes involving drugrelated offenses, criminal acts where juveniles are involved, so-called white-collar crimes such as fraud and corruption, cybercrime, crime against businesses, environmental crimes, and crimes against animals. Just as one example, it is highly unlikely that we will know the full extent of fraudulent claims against all federal, state, and local governments in the face of the massive influx of funding from recent and forthcoming Congressional action.

In proposing a set of crime classifications, the NAS panel recommended 11 major categories, 5 of which are not addressed in our current crime data collection systems. While there are parallel data systems that collect some of the missing data within these five crime categories, it remains unclear which federal agency, if any, has the authority to gather the information and aggregate it to give us anywhere near a complete estimate of crime in the United States. No federal or national entity has the assignment of estimating the total amount of crime that takes place in the United States. Without such leadership, we are left with an uninformed understanding of the health and wellness of communities throughout the country…(More)”

Trove of unique health data sets could help AI predict medical conditions earlier

Madhumita Murgia at the Financial Times: “…Ziad Obermeyer, a physician and machine learning scientist at the University of California, Berkeley, launched Nightingale Open Science last month — a treasure trove of unique medical data sets, each curated around an unsolved medical mystery that artificial intelligence could help to solve.

The data sets, released after the project received $2m of funding from former Google chief executive Eric Schmidt, could help to train computer algorithms to predict medical conditions earlier, triage better and save lives.

The data include 40 terabytes of medical imagery, such as X-rays, electrocardiogram waveforms and pathology specimens, from patients with a range of conditions, including high-risk breast cancer, sudden cardiac arrest, fractures and Covid-19. Each image is labelled with the patient’s medical outcomes, such as the stage of breast cancer and whether it resulted in death, or whether a Covid patient needed a ventilator.

Obermeyer has made the data sets free to use and mainly worked with hospitals in the US and Taiwan to build them over two years. He plans to expand this to Kenya and Lebanon in the coming months to reflect as much medical diversity as possible.

“Nothing exists like it,” said Obermeyer, who announced the new project in December alongside colleagues at NeurIPS, the global academic conference for artificial intelligence. “What sets this apart from anything available online is the data sets are labelled with the ‘ground truth’, which means with what really happened to a patient and not just a doctor’s opinion.”…

The Nightingale data sets were among dozens proposed this year at NeurIPS.

Other projects included a speech data set of Mandarin and eight subdialects recorded by 27,000 speakers in 34 cities in China; the largest audio data set of Covid respiratory sounds, such as breathing, coughing and voice recordings, from more than 36,000 participants to help screen for the disease; and a data set of satellite images covering the entire country of South Africa from 2006 to 2017, divided and labelled by neighbourhood, to study the social effects of spatial apartheid.

Elaine Nsoesie, a computational epidemiologist at the Boston University School of Public Health, said new types of data could also help with studying the spread of diseases in diverse locations, as people from different cultures react differently to illnesses.

She said her grandmother in Cameroon, for example, might think differently than Americans do about health. “If someone had an influenza-like illness in Cameroon, they may be looking for traditional, herbal treatments or home remedies, compared to drugs or different home remedies in the US.”

Computer scientists Serena Yeung and Joaquin Vanschoren, who proposed that research to build new data sets should be exchanged at NeurIPS, pointed out that the vast majority of the AI community still cannot find good data sets to evaluate their algorithms. This meant that AI researchers were still turning to data that were potentially “plagued with bias”, they said. “There are no good models without good data.”…(More)”.

Cities and the Climate-Data Gap

Article by Robert Muggah and Carlo Ratti: “With cities facing disastrous climate stresses and shocks in the coming years, one would think they would be rushing to implement mitigation and adaptation strategies. Yet most urban residents are only dimly aware of the risks, because their cities’ mayors, managers, and councils are not collecting or analyzing the right kinds of information.

With more governments adopting strategies to reduce greenhouse-gas (GHG) emissions, cities everywhere need to get better at collecting and interpreting climate data. More than 11,000 cities have already signed up to a global covenant to tackle climate change and manage the transition to clean energy, and many aim to achieve net-zero emissions before their national counterparts do. Yet virtually all of them still lack the basic tools for measuring progress.

Closing this gap has become urgent, because climate change is already disrupting cities around the world. Cities on almost every continent are being ravaged by heat waves, fires, typhoons, and hurricanes. Coastal cities are being battered by severe flooding connected to sea-level rise. And some megacities and their sprawling peripheries are being reconsidered altogether, as in the case of Indonesia’s $34 billion plan to move its capital from Jakarta to Borneo by 2024.

Worse, while many subnational governments are setting ambitious new green targets, over 40% of cities (home to some 400 million people) still have no meaningful climate-preparedness strategy. And this share is even lower in Africa and Asia – where an estimated 90% of all future urbanization in the next three decades is expected to occur.

We know that climate-preparedness plans are closely correlated with investment in climate action including nature-based solutions and systematic resilience. But strategies alone are not enough. We also need to scale up data-driven monitoring platforms. Powered by satellites and sensors, these systems can track temperatures inside and outside buildings, alert city dwellers to air-quality issues, and provide high-resolution information on concentrations of specific GHGs (carbon dioxide and nitrogen dioxide) and particulate matter…(More)”.

Surveillance Publishing

Working paper by Jefferson D. Pooley: “…This essay lingers on a prediction too: Clarivate’s business model is coming for scholarly publishing. Google is one peer, but the company’s real competitors are Elsevier, Springer Nature, Wiley, Taylor & Francis, and SAGE. Elsevier, in particular, has been moving into predictive analytics for years now. Of course the publishing giants have long profited off of academics and our university employers— by packaging scholars’ unpaid writing-and-editing labor only to sell it back to us as usuriously priced subscriptions or APCs. That’s a lucrative business that Elsevier and the others won’t give up. But they’re layering another business on top of their legacy publishing operations, in the Clarivate mold. The data trove that publishers are sitting on is, if anything, far richer than the citation graph alone. Why worry about surveillance publishing? One reason is the balance-sheet, since the companies’ trading in academic futures will further pad profits at the expense of taxpayers and students. The bigger reason is that our behavior—once alienated from us and abstracted into predictive metrics—will double back onto our work lives. Existing biases, like male academics’ propensity for selfcitation, will receive a fresh coat of algorithmic legitimacy. More broadly, the academic reward system is already distorted by metrics. To the extent that publishers’ tallies and indices get folded into grant-making, tenure-and-promotion, and other evaluative decisions, the metric tide will gain power. The biggest risk is that scholars will internalize an analytics mindset, one already encouraged by citation counts and impact factors….(More)”.

‘In Situ’ Data Rights

Essay by Marshall W Van Alstyne, Georgios Petropoulos, Geoffrey Parker, and Bertin Martens: “…Data portability sounds good in theory—number portability improved telephony—but this theory has its flaws.

  • Context: The value of data depends on context. Removing data from that context removes value. A portability exercise by experts at the ProgrammableWeb succeeded in downloading basic Facebook data but failed on a re-upload.1 Individual posts shed the prompts that preceded them and the replies that followed them. After all, that data concerns others.
  • Stagnation: Without a flow of updates, a captured stock depreciates. Data must be refreshed to stay current, and potential users must see those data updates to stay informed.
  • Impotence: Facts removed from their place of residence become less actionable. We cannot use them to make a purchase when removed from their markets or reach a friend when they are removed from their social networks. Data must be reconnected to be reanimated.
  • Market Failure. Innovation is slowed. Consider how markets for business analytics and B2B services develop. Lacking complete context, third parties can only offer incomplete benchmarking and analysis. Platforms that do offer market overview services can charge monopoly prices because they have context that partners and competitors do not.
  • Moral Hazard: Proposed laws seek to give merchants data portability rights but these entail a problem that competition authorities have not anticipated. Regulators seek to help merchants “multihome,” to affiliate with more than one platform. Merchants can take their earned ratings from one platform to another and foster competition. But, when a merchant gains control over its ratings data, magically, low reviews can disappear! Consumers fraudulently edited their personal records under early U.K. open banking rules. With data editing capability, either side can increase fraud, surely not the goal of data portability.

Evidence suggests that following GDPR, E.U. ad effectiveness fell, E.U. Web revenues fell, investment in E.U. startups fell, the stock and flow of apps available in the E.U. fell, while Google and Facebook, who already had user data, gained rather than lost market share as small firms faced new hurdles the incumbents managed to avoid. To date, the results are far from regulators’ intentions.

We propose a new in situ data right for individuals and firms, and a new theory of benefits. Rather than take data from the platform, or ex situ as portability implies, let us grant users the right to use their data in the location where it resides. Bring the algorithms to the data instead of bringing the data to the algorithms. Users determine when and under what conditions third parties access their in situ data in exchange for new kinds of benefits. Users can revoke access at any time and third parties must respect that. This patches and repairs the portability problems…(More).”

The unmet potential of open data

Essay by Jane Bambauer: “Open Data holds great promise — and more than thought leaders appreciate. 

Open access to data can lead to a much richer and more diverse range of research and development, hastening innovation. That’s why scientific journals are asking authors to make their data available, why governments are making publicly held records open by default, and why even private companies provide subsets of their data for general research use. Facebook, for example, launched an effort to provide research data that could be used to study the impact of social networks on election outcomes. 

Yet none of these moves have significantly changed the landscape. Because of lingering skepticism and some legitimate anxieties, we have not yet democratized access to Big Data.

There are a few well-trodden explanations for this failure — or this tragedy of the anti-commons — but none should dissuade us from pushing forward….

Finally, creating the infrastructure required to clean data, link it to other data sources, and make it useful for the most valuable research questions will not happen without a significant investment from somebody, be it the government or a private foundation. As Stefaan Verhulst, Andrew Zahuranec, and Andrew Young have explained, creating a useful data commons requires much more infrastructure and cultural buy-in than one might think. 

From my perspective, however, the greatest impediment to the open data movement has been a lack of vision within the intelligentsia. Outside a few domains like public health, intellectuals continue to traffic in and thrive on anecdotes and narratives. They have not perceived or fully embraced how access to broad and highly diverse data could radically change newsgathering (we could observe purchasing or social media data in real time), market competition (imagine designing a new robot using data collected from Uber’s autonomous cars), and responsive government (we could directly test claims of cause and effect related to highly salient issues during election time). 

With a quiet accumulation of use cases and increasing competence in handling and digesting data, we will eventually reach a tipping point where the appetite for more useful research data will outweigh the concerns and inertia that have bogged down progress in the open data movement…(More)”.

Quarantined Data? The impact, scope & challenges of open data during COVID

Chapter by Álvaro V. Ramírez-Alujas: “How do rates of COVID19 infection increase? How do populations respond to lockdown measures? How is the pandemic affecting the economic and social activity of communities beyond health? What can we do to mitigate risks and support families in this context? The answer to these and other key questions is part of the intense global public debate on the management of the health crisis and how appropriate public policy measures have been taken in order to combat the impact and effects of COVID19 around the world. The common ground to all of them? The availability and use of public data and information. This chapter reflects on the relevance of public information and the availability, processing and use of open data as the primary hub and key ingredient in the responsiveness of governments and public institutions to the COVID19 pandemic and its multiple impacts on society. Discussions are underway concerning the scope, paradoxes, lessons learned, and visible challenges with respect to the available evidence and comparative analysis of government strategies in the region, incorporating the urgent need to shift towards a more robust, sustainable data infrastructure anchored in a logic of strengthening the ecosystem of actors (public and private sectors, civil society and the scientific community) to shape a framework of governance, and a strong, emerging institutional architecture based on data management for sustainable development on a human scale…(More)”.

Incentivising research data sharing: a scoping review

Paper by Helen Buckley Woods and Stephen Pinfield: “Numerous mechanisms exist to incentivise researchers to share their data. This scoping review aims to identify and summarise evidence of the efficacy of different interventions to promote open data practices and provide an overview of current research….Seven major themes in the literature were identified: publisher/journal data sharing policies, metrics, software solutions,research data sharing agreements in general, open science ‘badges’, funder mandates, and initiatives….

A number of key messages for data sharing include: the need to build on existing cultures and practices, meeting people where they are and tailoring interventions to support them; the importance of publicising and explaining the policy/service widely; the need to have disciplinary data champions to model good practice and drive cultural change; the requirement to resource interventions properly; and the imperative to provide robust technical infrastructure and protocols, such as labelling of data sets, use of DOIs, data standards and use of data repositories….(More)”.

Mapping data portability initiatives, opportunities and challenges

OECD Report: “Data portability has become an essential tool for enhancing access to and sharing of data across digital services and platforms. This report explores to what extent data portability can empower users (natural and legal persons) to play a more active role in the re-use of their data across digital services and platforms. It also examines how data portability can help increase interoperability and data flows and thus enhance competition and innovation by reducing switching costs and lock-in effects….(More)”.

The argument against property rights in data

Report by Open Future: “25 years after the adoption of the Database Directive, there is mounting evidence that the introduction of the sui generis right did not lead to increased data access and use–instead, an additional intellectual property layer became one more obstacle.

Today, the European Commission, as it drafts the new Data Act, faces a fundamental choice both regarding the existing sui generis database rights and the introduction of a similar right to raw, machine-generated data. There is a risk that an approach that treats data as property will be further strengthened through a new data producer’s right. The idea of such a new exclusive right was introduced by the European Commission in 2017. This proposed right was to be based on the same template as the sui generis database right. 

A new property right will not secure the goals defined in the European data strategy: those of ensuring access and use of data, in a data economy built around common data spaces. Instead, they will strengthen existing monopolies in the data economy. 

Instead of introducing new property rights, greater access to and use of data should be achieved by introducing–in the Data Act, and in other currently debated legal acts–access rights that treat data as a commons. 

In this policy brief, we present the current policy debate on access and use of data, as well as the history of proposals for property rights in data – including the sui generis database right. We present arguments against the introduction of new property rights, and in favor of strengthening data access rights….(More)”.