How to Argue with an Algorithm: Lessons from the COMPAS ProPublica Debate


Paper by Anne L. Washington: “The United States optimizes the efficiency of its growing criminal justice system with algorithms however, legal scholars have overlooked how to frame courtroom debates about algorithmic predictions. In State v Loomis, the defense argued that the court’s consideration of risk assessments during sentencing was a violation of due process because the accuracy of the algorithmic prediction could not be verified. The Wisconsin Supreme Court upheld the consideration of predictive risk at sentencing because the assessment was disclosed and the defendant could challenge the prediction by verifying the accuracy of data fed into the algorithm.

Was the court correct about how to argue with an algorithm?

The Loomis court ignored the computational procedures that processed the data within the algorithm. How algorithms calculate data is equally as important as the quality of the data calculated. The arguments in Loomis revealed a need for new forms of reasoning to justify the logic of evidence-based tools. A “data science reasoning” could provide ways to dispute the integrity of predictive algorithms with arguments grounded in how the technology works.

This article’s contribution is a series of arguments that could support due process claims concerning predictive algorithms, specifically the Correctional Offender Management Profiling for Alternative Sanctions (“COMPAS”) risk assessment. As a comprehensive treatment, this article outlines the due process arguments in Loomis, analyzes arguments in an ongoing academic debate about COMPAS, and proposes alternative arguments based on the algorithm’s organizational context….(More)”

Group decisions: When more information isn’t necessarily better


News Release from the Santa Fee Institute: “In nature, group decisions are often a matter of life or death. At first glance, the way certain groups of animals like minnows branch off into smaller sub-groups might seem counterproductive to their survival. After all, information about, say, where to find some tasty fish roe or which waters harbor more of their predators, would flow more freely and seem to benefit more minnows if the school of fish behaved as a whole. However, new research published in Philosophical Transactions of the Royal Society B sheds light on the complexity of collective decision-making and uncovers new insights into the benefits of the internal structure of animal groups.

In their paper, Albert Kao, a Baird Scholar and Omidyar Fellow at the Santa Fe Institute, and Iain Couzin, Director of the Max Planck Institute for Ornithology and Chair of Biodiversity and Collective Behavior at the University of Konstanz, simulate the information-sharing patterns of animals that prefer to interact with certain individuals over others. The authors’ modeling of such animal groups upends previously held assumptions about internal group structure and improves upon our understanding of the influence of group organization and environment on both the collective decision-making process and its accuracy.

Modular — or cliquey — group structure isolates the flow of communication between individuals, so that only certain animals are privy to certain pieces of information. “A feature of modular structure is that there’s always information loss,” says Kao, “but the effect of that information loss on accuracy depends on the environment.”

In simple environments, the impact of these modular groups is detrimental to accuracy, but when animals face many different sources of information, the effect is actually the opposite. “Surprisingly,” says Kao, “in complex environments, the information loss even helps accuracy in a lot of situations.” More information, in this case, is not necessarily better.

“Modular structure can have a profound — and unexpected — impact on the collective intelligence of groups,” says Couzin. “This may indeed be one of the reasons that we see internal structure in so many group-living species, from schooling fish and flocking birds to wild primate groups.”

Potentially, these new observations could be applied to many different kinds of social networks, from the migration patterns of birds to the navigation of social media landscapes to the organization of new companies, deepening our grasp of complex organization and collective behavior….(More)”.

(The paper, “Modular structure within groups causes information loss but can improve decision accuracy,” is part of a theme issue in the Philosophical Transactions of the Royal Society B entitled “Liquid Brains, Solid Brains: How distributed cognitive architectures process information.” The issue was inspired by a Santa Fe Institute working group and edited by Ricard Solé (Universitat Pompeu Fabra), Melanie Moses (University of New Mexico), and Stephanie Forrest (Arizona State University).

Technology-facilitated Societal Consensus


Paper by Timotheus Kampik and Amro Najjar: “The spread of radical opinions, facilitated by homophilic Internet communities (echo chambers), has become a threat to the stability of societies around the globe. The concept of choice architecture–the design of choice information for consumers with the goal of facilitating societally beneficial decisions–provides a promising (although not uncontroversial) general concept to address this problem.

The choice architecture approach is reflected in recent proposals advocating for recommender systems that consider the societal impact of their recommendations and not only strive to optimize revenue streams.

However, the precise nature of the goal state such systems should work towards remains an open question. In this paper, we suggest that this goal state can be defined by considering target opinion spread in a society on different topics of interest as a multivariate normal distribution; i.e., while there is a diversity of opinions, most people have similar opinions on most topics. We explain why this approach is promising, and list a set of crossdisciplinary research challenges that need to be solved to advance the idea….(More)”.

Characterizing the Biomedical Data-Sharing Landscape


Paper by Angela G. Villanueva et al: “Advances in technologies and biomedical informatics have expanded capacity to generate and share biomedical data. With a lens on genomic data, we present a typology characterizing the data-sharing landscape in biomedical research to advance understanding of the key stakeholders and existing data-sharing practices. The typology highlights the diversity of data-sharing efforts and facilitators and reveals how novel data-sharing efforts are challenging existing norms regarding the role of individuals whom the data describe.

Technologies such as next-generation sequencing have dramatically expanded capacity to generate genomic data at a reasonable cost, while advances in biomedical informatics have created new tools for linking and analyzing diverse data types from multiple sources. Further, many research-funding agencies now mandate that grantees share data. The National Institutes of Health’s (NIH) Genomic Data Sharing (GDS) Policy, for example, requires NIH-funded research projects generating large-scale human genomic data to share those data via an NIH-designated data repository such as the Database of Geno-types and Phenotypes (dbGaP). Another example is the Parent Project Muscular Dystrophy, a non-profit organization that requires applicants to propose a data-sharing plan and take into account an applicant’s history of data sharing.

The flow of data to and from different projects, institutions, and sectors is creating a medical information commons (MIC), a data-sharing ecosystem consisting of networked resources sharing diverse health-related data from multiple sources for research and clinical uses. This concept aligns with the 2018 NIH Strategic Plan for Data Science, which uses the term “data ecosystem” to describe “a distributed, adaptive, open system with properties of self-organization, scalability and sustainability” and proposes to “modernize the biomedical research data ecosystem” by funding projects such as the NIH Data Commons. Consistent with Elinor Ostrom’s discussion of nested institutional arrangements, an MIC is both singular and plural and may describe the ecosystem as a whole or individual components contributing to the ecosystem. Thus, resources like the NIH Data Commons with its associated institutional arrangements are MICs, and also form part of the larger MIC that encompasses all such resources and arrangements.

Although many research funders incentivize data sharing, in practice, progress in making biomedical data broadly available to maximize its utility is often hampered by a broad range of technical, legal, cultural, normative, and policy challenges that include achieving interoperability, changing the standards for academic promotion, and addressing data privacy and security concerns. Addressing these challenges requires multi-stakeholder involvement. To identify relevant stakeholders and advance understanding of the contributors to an MIC, we conducted a landscape analysis of existing data-sharing efforts and facilitators. Our work builds on typologies describing various aspects of data sharing that focused on biobanks, research consortia, or where data reside (e.g., degree of data centralization).7 While these works are informative, we aimed to capture the biomedical data-sharing ecosystem with a wider scope. Understanding the components of an MIC ecosystem and how they interact, and identifying emerging trends that test existing norms (such as norms respecting the role of individuals from whom the data describe), is essential to fostering effective practices, policies and governance structures, guiding resource allocation, and promoting the overall sustainability of the MIC….(More)”

Leveraging Big Data for Social Responsibility


Paper by Cynthia Ann Peterson: “Big data has the potential to revolutionize the way social risks are managed by providing enhanced insight to enable more informed actions to be taken. The objective of this paper is to share the approach taken by PETRONAS to leverage big data to enhance its social performance practice, specifically in social risk assessments and grievance mechanism.

The paper will deliberate on the benefits, challenges and opportunities to improve the management of social risk through analytics, and how PETRONAS has taken those factors into consideration in the enhancement of its social risk assessment and grievance mechanism tools. Key considerations such as disaggregation of data, the appropriate leading and lagging indicators and having a human rights lens to data will also be discussed.

Leveraging on big data is still in its early stages in the social risk space, similar with other areas in the oil and gas industry according to research by Wood Mackenzie. Even so, there are several concerns which include; the aggregation of data may result in risks to minority or vulnerable groups not getting surfaced; privacy breaches which violate human rights and potential discrimination due to prescriptive analysis, such as on a community’s propensity to pose certain social risks to projects or operations. Certainly, there are many challenges ahead which need to be considered, including how best to take a human rights approach to using big data.

Nevertheless, harnessing the power of big data will help social risk practitioners turn a high volume of disparate pieces of raw data from grievance mechanisms and social risk assessments into information that can be used to avoid or mitigate risks now and in the future through predictive technology. Consumer and other industries are benefiting from this leverage now, and social performance practitioners in the oil and gas industry can emulate these proven models….(More)”.

The Importance of Data Access Regimes for Artificial Intelligence and Machine Learning


JRC Digital Economy Working Paper by Bertin Martens: “Digitization triggered a steep drop in the cost of information. The resulting data glut created a bottleneck because human cognitive capacity is unable to cope with large amounts of information. Artificial intelligence and machine learning (AI/ML) triggered a similar drop in the cost of machine-based decision-making and helps in overcoming this bottleneck. Substantial change in the relative price of resources puts pressure on ownership and access rights to these resources. This explains pressure on access rights to data. ML thrives on access to big and varied datasets. We discuss the implications of access regimes for the development of AI in its current form of ML. The economic characteristics of data (non-rivalry, economies of scale and scope) favour data aggregation in big datasets. Non-rivalry implies the need for exclusive rights in order to incentivise data production when it is costly. The balance between access and exclusion is at the centre of the debate on data regimes. We explore the economic implications of several modalities for access to data, ranging from exclusive monopolistic control to monopolistic competition and free access. Regulatory intervention may push the market beyond voluntary exchanges, either towards more openness or reduced access. This may generate private costs for firms and individuals. Society can choose to do so if the social benefits of this intervention outweigh the private costs.

We briefly discuss the main EU legal instruments that are relevant for data access and ownership, including the General Data Protection Regulation (GDPR) that defines the rights of data subjects with respect to their personal data and the Database Directive (DBD) that grants ownership rights to database producers. These two instruments leave a wide legal no-man’s land where data access is ruled by bilateral contracts and Technical Protection Measures that give exclusive control to de facto data holders, and by market forces that drive access, trade and pricing of data. The absence of exclusive rights might facilitate data sharing and access or it may result in a segmented data landscape where data aggregation for ML purposes is hard to achieve. It is unclear if incompletely specified ownership and access rights maximize the welfare of society and facilitate the development of AI/ML…(More)”

Data Trusts: More Data than Trust? The Perspective of the Data Subject in the Face of a Growing Problem


Paper by Christine Rinik: “In the recent report, Growing the Artificial Intelligence Industry in the UK, Hall and Pesenti suggest the use of a ‘data trust’ to facilitate data sharing. Whilst government and corporations are focusing on their need to facilitate data sharing, the perspective of many individuals is that too much data is being shared. The issue is not only about data, but about power. The individual does not often have a voice when issues relating to data sharing are tackled. Regulators can cite the ‘public interest’ when data governance is discussed, but the individual’s interests may diverge from that of the public.

This paper considers the data subject’s position with respect to data collection leading to considerations about surveillance and datafication. Proposals for data trusts will be considered applying principles of English trust law to possibly mitigate the imbalance of power between large data users and individual data subjects. Finally, the possibility of a workable remedy in the form of a class action lawsuit which could give the data subjects some collective power in the event of a data breach will be explored. Despite regulatory efforts to protect personal data, there is a lack of public trust in the current data sharing system….(More)”.

Crowdsourcing in medical research: concepts and applications


Paper by Joseph D. Tucker, Suzanne Day, Weiming Tang, and Barry Bayus: “Crowdsourcing shifts medical research from a closed environment to an open collaboration between the public and researchers. We define crowdsourcing as an approach to problem solving which involves an organization having a large group attempt to solve a problem or part of a problem, then sharing solutions. Crowdsourcing allows large groups of individuals to participate in medical research through innovation challenges, hackathons, and related activities. The purpose of this literature review is to examine the definition, concepts, and applications of crowdsourcing in medicine.

This multi-disciplinary review defines crowdsourcing for medicine, identifies conceptual antecedents (collective intelligence and open source models), and explores implications of the approach. Several critiques of crowdsourcing are also examined. Although several crowdsourcing definitions exist, there are two essential elements: (1) having a large group of individuals, including those with skills and those without skills, propose potential solutions; (2) sharing solutions through implementation or open access materials. The public can be a central force in contributing to formative, pre-clinical, and clinical research. A growing evidence base suggests that crowdsourcing in medicine can result in high-quality outcomes, broad community engagement, and more open science….(More)”

Data Cultures, Culture as Data


Introduction to Special Issue of Cultural Analytics by Amelia Acker and Tanya Clement: “Data have become pervasive in research in the humanities and the social sciences. New areas, objects, and situations for study have developed; and new methods for working with data are shepherded by new epistemologies and (potential) paradigm shifts. But data didn’t just happen to us. We have happened to data. In every field, scholars are drawing boundaries between data and humans as if making meaning with data is innocent work. But these boundaries are never innocent. Questions are emerging about the relationships of culture to data—urgent questions that focus on the codification (or code-ification) of social and cultural bias and the erosion of human agency, subjectivity, and identity.

For this special issue of Cultural Analytics we invited submissions to respond to these concerns as they relate to the proximity and distance between the creation of data and its collection; the nature of data as object or content; modes and contexts of data circulation, dissemination and preservation; histories and imaginary data futures; data expertise; data and technological progressivism; the cultivation and standardization of data; and the cultures, communities, and consciousness of data production. The contributions we received ranged in type from research or theory articles to data reviews and opinion pieces responding to the theme of “data cultures”. Each contribution asks questions we should all be asking: What is the role we play in the data cultures/culture as data we form around sociomaterial practices? How can we better understand how these practices effect, and affect, the materialization of subjects, objects, and the relations between them? How can we engage our data culture(s) in practical, critical, and generative ways? As Karen Barad writes, “We are responsible for the world in which we live not because it is an arbitrary construction of our choosing, but because it is sedimented out of particular practices that we have a role in shaping.”1Ultimately, our contributors are focused on this central concern: where is our agency in the responsibility of shaping data cultures? What role can scholarship play in better understanding our culture as data?…(More)”.

Digital Health Data And Information Sharing: A New Frontier For Health Care Competition?


Paper by Lucia Savage, Martin Gaynor and Julie Adler-Milstein: “There are obvious benefits to having patients’ health information flow across health providers. Providers will have more complete information about patients’ health and treatment histories, allowing them to make better treatment recommendations, and avoid unnecessary and duplicative testing or treatment. This should result in better and more efficient treatment, and better health outcomes. Moreover, the federal government has provided substantial incentives for the exchange of health information. Since 2009, the federal government has spent more than $40 billion to ensure that most physicians and hospitals use electronic health records, and to incentivize the use of electronic health information and health information exchange (the enabling statute is the Health Information Technology for Clinical Health Act), and in 2016 authorized substantial fines for failing to share appropriate information.

Yet, in spite of these incentives and the clear benefits to patients, the exchange of health information remains limited. There is evidence that this limited exchange in due in part to providers and platforms attempting to retain, rather than share, information (“information blocking”). In this article we examine legal and business reasons why health information may not be flowing. In particular, we discuss incentives providers and platforms can have for information blocking as a means to maintain or enhance their market position and thwart competition. Finally, we recommend steps to better understand whether the absence of information exchange, is due to information blocking that harms competition and consumers….(More)”