How medical AI devices are evaluated: limitations and recommendations from an analysis of FDA approvals


Paper by Eric Wu et al: “Medical artificial-intelligence (AI) algorithms are being increasingly proposed for the assessment and care of patients. Although the academic community has started to develop reporting guidelines for AI clinical trials, there are no established best practices for evaluating commercially available algorithms to ensure their reliability and safety. The path to safe and robust clinical AI requires that important regulatory questions be addressed. Are medical devices able to demonstrate performance that can be generalized to the entire intended population? Are commonly faced shortcomings of AI (overfitting to training data, vulnerability to data shifts, and bias against underrepresented patient subgroups) adequately quantified and addressed?

In the USA, the US Food and Drug Administration (FDA) is responsible for approving commercially marketed medical AI devices. The FDA releases publicly available information on approved devices in the form of a summary document that generally contains information about the device description, indications for use, and performance data of the device’s evaluation study. The FDA has recently called for improvement of test-data quality, improvement of trust and transparency with users, monitoring of algorithmic performance and bias on the intended population, and testing with clinicians in the loop. To understand the extent to which these concerns are addressed in practice, we have created an annotated database of FDA-approved medical AI devices and systematically analyzed how these devices were evaluated before approval. Additionally, we have conducted a case study of pneumothorax-triage devices and found that evaluating deep-learning models at a single site alone, which is often done, can mask weaknesses in the models and lead to worse performance across sites.

Fig. 1: Breakdown of 130 FDA-approved medical AI devices by body area.

figure1

Devices are categorized by risk level (square, high risk; circle, low risk). Blue indicates that a multi-site evaluation was reported; otherwise, symbols are gray. Red outline indicates a prospective study (key, right margin). Numbers in key indicate the number of devices with each characteristic….(More)”.

Between surveillance and recognition: Rethinking digital identity in aid


Paper by Keren Weitzberg et al: “Identification technologies like biometrics have long been associated with securitisation, coercion and surveillance but have also, in recent years, become constitutive of a politics of empowerment, particularly in contexts of international aid. Aid organisations tend to see digital identification technologies as tools of recognition and inclusion rather than oppressive forms of monitoring, tracking and top-down control. In addition, practices that many critical scholars describe as aiding surveillance are often experienced differently by humanitarian subjects. This commentary examines the fraught questions this raises for scholars of international aid, surveillance studies and critical data studies. We put forward a research agenda that tackles head-on how critical theories of data and society can better account for the ambivalent dynamics of ‘power over’ and ‘power to’ that digital aid interventions instantiate….(More)”.

Using Open Data to Monitor the Status of a Metropolitan Area: The Case of the Metropolitan Area of Turin


Paper by Candela, Filippo; and Mulassano, Paolo: “The paper presents and discusses the method adopted by Compagnia di San Paolo, one of the largest European philanthropic institutions, to monitor the advancement, despite the COVID-19 situation, in providing specific input to the decision-making process for dedicated projects. An innovative approach based on the use of daily open data was adopted to monitor the metropolitan area with a multidimensional perspective. Several open data indicators related to the economy, society, culture, environment, and climate were identified and incorporated into the decision support system dashboard. Indicators are presented and discussed to highlight how open data could be integrated into the foundation’s strategic approach and potentially replicated on a large scale by local institutions. Moreover, starting from the lessons learned from this experience, the paper analyzes the opportunities and critical issues surrounding the use of open data, not only to improve the quality of life during the COVID-19 epidemic but also for the effective regulation of society, the participation of citizens, and their well-being….(More)”

More than a number: The telephone and the history of digital identification


Article by Jennifer Holt and Michael Palm: “This article examines the telephone’s entangled history within contemporary infrastructural systems of ‘big data’, identity and, ultimately, surveillance. It explores the use of telephone numbers, keypads and wires to offer new perspective on the imbrication of telephonic information, interface and infrastructure within contemporary surveillance regimes. The article explores telephone exchanges as arbiters of cultural identities, keypads as the foundation of digital transactions and wireline networks as enacting the transformation of citizens and consumers into digital subjects ripe for commodification and surveillance. Ultimately, this article argues that telephone history – specifically the histories of telephone numbers and keypads as well as infrastructure and policy in the United States – continues to inform contemporary practices of social and economic exchange as they relate to consumer identity, as well as to current discourses about surveillance and privacy in a digital age…(More)”.

The Norms of Algorithmic Credit Scoring


Paper by Nikita Aggarwal: “This article examines the growth of algorithmic credit scoring and its implications for the regulation of consumer credit markets in the UK. It constructs a frame of analysis for the regulation of algorithmic credit scoring, bound by the core norms underpinning UK consumer credit and data protection regulation: allocative efficiency, distributional fairness and consumer privacy (as autonomy). Examining the normative trade-offs that arise within this frame, the article argues that existing data protection and consumer credit frameworks do not achieve an appropriate normative balance in the regulation of algorithmic credit scoring. In particular, the growing reliance on consumers’ personal data by lenders due to algorithmic credit scoring, coupled with the ineffectiveness of existing data protection remedies has created a data protection gap in consumer credit markets that presents a significant threat to consumer privacy and autonomy. The article makes recommendations for filling this gap through institutional and substantive regulatory reforms….(More)”.

How video conferencing reduces vocal synchrony and collective intelligence


Paper by Maria Tomprou et al: “Collective intelligence (CI) is the ability of a group to solve a wide range of problems. Synchrony in nonverbal cues is critically important to the development of CI; however, extant findings are mostly based on studies conducted face-to-face. Given how much collaboration takes place via the internet, does nonverbal synchrony still matter and can it be achieved when collaborators are physically separated? Here, we hypothesize and test the effect of nonverbal synchrony on CI that develops through visual and audio cues in physically-separated teammates. We show that, contrary to popular belief, the presence of visual cues surprisingly has no effect on CI; furthermore, teams without visual cues are more successful in synchronizing their vocal cues and speaking turns, and when they do so, they have higher CI. Our findings show that nonverbal synchrony is important in distributed collaboration and call into question the necessity of video support….(More)”.

The (Im)possibility of Fairness: Different Value Systems Require Different Mechanisms For Fair Decision Making


Sorelle A. Friedler, Carlos Scheidegger, Suresh Venkatasubramanian at Communications of the ACM: “Automated decision-making systems (often machine learning-based) now commonly determine criminal sentences, hiring choices, and loan applications. This widespread deployment is concerning, since these systems have the potential to discriminate against people based on their demographic characteristics. Current sentencing risk assessments are racially biased, and job advertisements discriminate on gender. These concerns have led to an explosive growth in fairness-aware machine learning, a field that aims to enable algorithmic systems that are fair by design.

To design fair systems, we must agree precisely on what it means to be fair. One such definition is individual fairness: individuals who are similar (with respect to some task) should be treated similarly (with respect to that task). Simultaneously, a different definition states that demographic groups should, on the whole, receive similar decisions. This group fairness definition is inspired by civil rights law in the U.S. and U.K. Other definitions state that fair systems should err evenly across demographic groups. Many of these definitions have been incorporated into machine learning pipelines.

In this article, we introduce a framework for understanding these different definitions of fairness and how they relate to each other. Crucially, our framework shows these definitions and their implementations correspond to different axiomatic beliefs about the world. We present two such worldviews and will show they are fundamentally incompatible. First, one can believe the observation processes that generate data for machine learning are structurally biased. This belief provides a justification for seeking non-discrimination. When one believes that demographic groups are, on the whole, fundamentally similar, group fairness mechanisms successfully guarantee the top-level goal of non-discrimination: similar groups receiving similar treatment. Alternatively, one can assume the observed data generally reflects the true underlying reality about differences between people. These worldviews are in conflict; a single algorithm cannot satisfy either definition of fairness under both worldviews. Thus, researchers and practitioners ought to be intentional and explicit about world-views and value assumptions: the systems they design will always encode some belief about the world….(More)”.

Lawmakers’ use of scientific evidence can be improved


Paper by D. Max Crowley et al: “This study is an experimental trial that demonstrates the potential for formal outreach strategies to change congressional use of research. Our results show that collaboration between policy and research communities can change policymakers’ value of science and result in legislation that appears to be more inclusive of research evidence. The findings of this study also demonstrated changes in researchers’ knowledge and motivation to engage with policymakers as well as their actual policy engagement behavior. Together, the observed changes in both policymakers and researchers randomized to receive an intervention for supporting legislative use of research evidence (i.e., the Research-to-Policy Collaboration model) provides support for the underlying theories around the social nature of research translation and evidence use….(More)”.

An early warning approach to monitor COVID-19 activity with multiple digital traces in near real time


Paper by Nicole E. Kogan et al: “We propose that several digital data sources may provide earlier indication of epidemic spread than traditional COVID-19 metrics such as confirmed cases or deaths. Six such sources are examined here: (i) Google Trends patterns for a suite of COVID-19–related terms; (ii) COVID-19–related Twitter activity; (iii) COVID-19–related clinician searches from UpToDate; (iv) predictions by the global epidemic and mobility model (GLEAM), a state-of-the-art metapopulation mechanistic model; (v) anonymized and aggregated human mobility data from smartphones; and (vi) Kinsa smart thermometer measurements.

We first evaluate each of these “proxies” of COVID-19 activity for their lead or lag relative to traditional measures of COVID-19 activity: confirmed cases, deaths attributed, and ILI. We then propose the use of a metric combining these data sources into a multiproxy estimate of the probability of an impending COVID-19 outbreak. Last, we develop probabilistic estimates of when such a COVID-19 outbreak will occur on the basis of multiproxy variability. These outbreak-timing predictions are made for two separate time periods: the first, a “training” period, from 1 March to 31 May 2020, and the second, a “validation” period, from 1 June to 30 September 2020. Consistent predictive behavior among proxies in both of these subsequent and nonoverlapping time periods would increase the confidence that they may capture future changes in the trajectory of COVID-19 activity….(More)”.

Female Victims of Gendered Violence, Their Human Rights and the Innovative Use of Data Technology to Predict, Prevent and Pursue Harms


Paper by Jamie Grace: “This short paper has the objective of making the case for more investment to explore the use of data-driven technology to predict, prevent and pursue criminal harms against women. The paper begins with an overview of the contemporary scale of the issues, and the current problem of recording data on serious violent and sexual offending against women, before moving on to consider the current status and strength of positive obligations under UK human rights law to protect victims of intimate partner violence. The paper then looks at some examples of how data tech can augment policing of serious criminal harms against women, before turning to consider some of the legal problems concerning potential bias, inaccuracies and transparency that can dog ‘predictive policing’ in particular. Finally, a conclusion is offered up which explores the degree to which investment and exploration of the predictive policing of intimate partner violence must be pursued at the same time as better oversight mechanisms are also developed for the use of machine learning technology in public protection roles, since the two emphases go hand in hand…(More)”.