The Importance of Data Access Regimes for Artificial Intelligence and Machine Learning


JRC Digital Economy Working Paper by Bertin Martens: “Digitization triggered a steep drop in the cost of information. The resulting data glut created a bottleneck because human cognitive capacity is unable to cope with large amounts of information. Artificial intelligence and machine learning (AI/ML) triggered a similar drop in the cost of machine-based decision-making and helps in overcoming this bottleneck. Substantial change in the relative price of resources puts pressure on ownership and access rights to these resources. This explains pressure on access rights to data. ML thrives on access to big and varied datasets. We discuss the implications of access regimes for the development of AI in its current form of ML. The economic characteristics of data (non-rivalry, economies of scale and scope) favour data aggregation in big datasets. Non-rivalry implies the need for exclusive rights in order to incentivise data production when it is costly. The balance between access and exclusion is at the centre of the debate on data regimes. We explore the economic implications of several modalities for access to data, ranging from exclusive monopolistic control to monopolistic competition and free access. Regulatory intervention may push the market beyond voluntary exchanges, either towards more openness or reduced access. This may generate private costs for firms and individuals. Society can choose to do so if the social benefits of this intervention outweigh the private costs.

We briefly discuss the main EU legal instruments that are relevant for data access and ownership, including the General Data Protection Regulation (GDPR) that defines the rights of data subjects with respect to their personal data and the Database Directive (DBD) that grants ownership rights to database producers. These two instruments leave a wide legal no-man’s land where data access is ruled by bilateral contracts and Technical Protection Measures that give exclusive control to de facto data holders, and by market forces that drive access, trade and pricing of data. The absence of exclusive rights might facilitate data sharing and access or it may result in a segmented data landscape where data aggregation for ML purposes is hard to achieve. It is unclear if incompletely specified ownership and access rights maximize the welfare of society and facilitate the development of AI/ML…(More)”

Data Trusts: More Data than Trust? The Perspective of the Data Subject in the Face of a Growing Problem


Paper by Christine Rinik: “In the recent report, Growing the Artificial Intelligence Industry in the UK, Hall and Pesenti suggest the use of a ‘data trust’ to facilitate data sharing. Whilst government and corporations are focusing on their need to facilitate data sharing, the perspective of many individuals is that too much data is being shared. The issue is not only about data, but about power. The individual does not often have a voice when issues relating to data sharing are tackled. Regulators can cite the ‘public interest’ when data governance is discussed, but the individual’s interests may diverge from that of the public.

This paper considers the data subject’s position with respect to data collection leading to considerations about surveillance and datafication. Proposals for data trusts will be considered applying principles of English trust law to possibly mitigate the imbalance of power between large data users and individual data subjects. Finally, the possibility of a workable remedy in the form of a class action lawsuit which could give the data subjects some collective power in the event of a data breach will be explored. Despite regulatory efforts to protect personal data, there is a lack of public trust in the current data sharing system….(More)”.

Crowdsourcing in medical research: concepts and applications


Paper by Joseph D. Tucker, Suzanne Day, Weiming Tang, and Barry Bayus: “Crowdsourcing shifts medical research from a closed environment to an open collaboration between the public and researchers. We define crowdsourcing as an approach to problem solving which involves an organization having a large group attempt to solve a problem or part of a problem, then sharing solutions. Crowdsourcing allows large groups of individuals to participate in medical research through innovation challenges, hackathons, and related activities. The purpose of this literature review is to examine the definition, concepts, and applications of crowdsourcing in medicine.

This multi-disciplinary review defines crowdsourcing for medicine, identifies conceptual antecedents (collective intelligence and open source models), and explores implications of the approach. Several critiques of crowdsourcing are also examined. Although several crowdsourcing definitions exist, there are two essential elements: (1) having a large group of individuals, including those with skills and those without skills, propose potential solutions; (2) sharing solutions through implementation or open access materials. The public can be a central force in contributing to formative, pre-clinical, and clinical research. A growing evidence base suggests that crowdsourcing in medicine can result in high-quality outcomes, broad community engagement, and more open science….(More)”

Data Cultures, Culture as Data


Introduction to Special Issue of Cultural Analytics by Amelia Acker and Tanya Clement: “Data have become pervasive in research in the humanities and the social sciences. New areas, objects, and situations for study have developed; and new methods for working with data are shepherded by new epistemologies and (potential) paradigm shifts. But data didn’t just happen to us. We have happened to data. In every field, scholars are drawing boundaries between data and humans as if making meaning with data is innocent work. But these boundaries are never innocent. Questions are emerging about the relationships of culture to data—urgent questions that focus on the codification (or code-ification) of social and cultural bias and the erosion of human agency, subjectivity, and identity.

For this special issue of Cultural Analytics we invited submissions to respond to these concerns as they relate to the proximity and distance between the creation of data and its collection; the nature of data as object or content; modes and contexts of data circulation, dissemination and preservation; histories and imaginary data futures; data expertise; data and technological progressivism; the cultivation and standardization of data; and the cultures, communities, and consciousness of data production. The contributions we received ranged in type from research or theory articles to data reviews and opinion pieces responding to the theme of “data cultures”. Each contribution asks questions we should all be asking: What is the role we play in the data cultures/culture as data we form around sociomaterial practices? How can we better understand how these practices effect, and affect, the materialization of subjects, objects, and the relations between them? How can we engage our data culture(s) in practical, critical, and generative ways? As Karen Barad writes, “We are responsible for the world in which we live not because it is an arbitrary construction of our choosing, but because it is sedimented out of particular practices that we have a role in shaping.”1Ultimately, our contributors are focused on this central concern: where is our agency in the responsibility of shaping data cultures? What role can scholarship play in better understanding our culture as data?…(More)”.

Digital Health Data And Information Sharing: A New Frontier For Health Care Competition?


Paper by Lucia Savage, Martin Gaynor and Julie Adler-Milstein: “There are obvious benefits to having patients’ health information flow across health providers. Providers will have more complete information about patients’ health and treatment histories, allowing them to make better treatment recommendations, and avoid unnecessary and duplicative testing or treatment. This should result in better and more efficient treatment, and better health outcomes. Moreover, the federal government has provided substantial incentives for the exchange of health information. Since 2009, the federal government has spent more than $40 billion to ensure that most physicians and hospitals use electronic health records, and to incentivize the use of electronic health information and health information exchange (the enabling statute is the Health Information Technology for Clinical Health Act), and in 2016 authorized substantial fines for failing to share appropriate information.

Yet, in spite of these incentives and the clear benefits to patients, the exchange of health information remains limited. There is evidence that this limited exchange in due in part to providers and platforms attempting to retain, rather than share, information (“information blocking”). In this article we examine legal and business reasons why health information may not be flowing. In particular, we discuss incentives providers and platforms can have for information blocking as a means to maintain or enhance their market position and thwart competition. Finally, we recommend steps to better understand whether the absence of information exchange, is due to information blocking that harms competition and consumers….(More)”

Characterizing the cultural niches of North American birds


Justin G. Schuetz and Alison Johnston at PNAS: “Efforts to mitigate the current biodiversity crisis require a better understanding of how and why humans value other species. We use Internet query data and citizen science data to characterize public interest in 621 bird species across the United States. We estimate the relative popularity of different birds by quantifying how frequently people use Google to search for species, relative to the rates at which they are encountered in the environment.

In intraspecific analyses, we also quantify the degree to which Google searches are limited to, or extend beyond, the places in which people encounter each species. The resulting metrics of popularity and geographic specificity of interest allow us to define aspects of relationships between people and birds within a cultural niche space. We then estimate the influence of species traits and socially constructed labels on niche positions to assess the importance of observations and ideas in shaping public interest in birds.

Our analyses show clear effects of migratory strategy, color, degree of association with bird feeders, and, especially, body size on niche position. They also indicate that cultural labels, including “endangered,” “introduced,” and, especially, “team mascot,” are strongly associated with the magnitude and geographic specificity of public interest in birds. Our results provide a framework for exploring complex relationships between humans and other species and enable more informed decision-making across diverse bird conservation strategies and goals….(More)”.

Open government for all? Co-creating digital public services for older adults through data walks


Paper by Juliane Jarke: “The purpose of this paper is to review interventions/methods for engaging older adults in meaningful digital public service design by enabling them to engage critically and productively with open data and civic tech.

The paper evaluates data walks as a method for engaging non-tech-savvy citizens in co-design work. These were evaluated along a framework considering how such interventions allow for sharing control (e.g. over design decisions), sharing expertise and enabling change.

Within a co-creation project, different types of data walks may be conducted, including ideation walks, data co-creation walks or user test walks. These complement each other with respect to how they facilitate the sharing of control and expertise, and enable change for a variety of older citizens.

Data walks are a method with a low-threshold, potentially enabling a variety of citizens to engage in co-design activities relating to open government and civic tech.

Such methods address the digital divide and further social participation of non-tech-savvy citizens. They value the resources and expertise of older adults as co-designers and partners, and counter stereotypical ideas about age and ageing….(More)”.

Predictive Big Data Analytics using the UK Biobank Data


Paper by Ivo D Dinov et al: “The UK Biobank is a rich national health resource that provides enormous opportunities for international researchers to examine, model, and analyze census-like multisource healthcare data. The archive presents several challenges related to aggregation and harmonization of complex data elements, feature heterogeneity and salience, and health analytics. Using 7,614 imaging, clinical, and phenotypic features of 9,914 subjects we performed deep computed phenotyping using unsupervised clustering and derived two distinct sub-cohorts. Using parametric and nonparametric tests, we determined the top 20 most salient features contributing to the cluster separation. Our approach generated decision rules to predict the presence and progression of depression or other mental illnesses by jointly representing and modeling the significant clinical and demographic variables along with the derived salient neuroimaging features. We reported consistency and reliability measures of the derived computed phenotypes and the top salient imaging biomarkers that contributed to the unsupervised clustering. This clinical decision support system identified and utilized holistically the most critical biomarkers for predicting mental health, e.g., depression. External validation of this technique on different populations may lead to reducing healthcare expenses and improving the processes of diagnosis, forecasting, and tracking of normal and pathological aging….(More)”.

Access to Algorithms


Paper by Hannah Bloch-Wehba: “Federal, state, and local governments increasingly depend on automated systems — often procured from the private sector — to make key decisions about civil rights and civil liberties. When individuals affected by these decisions seek access to information about the algorithmic methodologies that produced them, governments frequently assert that this information is proprietary and cannot be disclosed. 

Recognizing that opaque algorithmic governance poses a threat to civil rights and liberties, scholars have called for a renewed focus on transparency and accountability for automated decision making. But scholars have neglected a critical avenue for promoting public accountability and transparency for automated decision making: the law of access to government records and proceedings. This Article fills this gap in the literature, recognizing that the Freedom of Information Act, its state equivalents, and the First Amendment provide unappreciated legal support for algorithmic transparency.

The law of access performs three critical functions in promoting algorithmic accountability and transparency. First, by enabling any individual to challenge algorithmic opacity in government records and proceedings, the law of access can relieve some of the burden otherwise borne by parties who are often poor and under-resourced. Second, access law calls into question government’s procurement of algorithmic decision making technologies from private vendors, subject to contracts that include sweeping protections for trade secrets and intellectual property rights. Finally, the law of access can promote an urgently needed public debate on algorithmic governance in the public sector….(More)”.

Big Data Applications in Governance and Policy


Introduction to Special Issue of Politics and Governance by Sarah Giest and Reuben Ng: ” Recent literature has been trying to grasp the extent as to which big data applications affect the governance and policymaking of countries and regions (Boyd & Crawford, 2012; Giest, 2017; Höchtl, Parycek, & Schöllhammer, 2015; Poel, Meyer, & Schroeder, 2018). The discussion includes the comparison to e-government and evidence-based policymaking developments that existed long before the idea of big data entered the policy realm. The theoretical extent of this discussion however lacks some of the more practical consequences that come with the active use of data-driven applications. In fact, much of the work focuses on the input-side of policymaking, looking at which data and technology enters the policy process, however very little is dedicated to the output side.

In short, how has big data shaped data governance and policymaking? The contributions to this thematic issue shed light on this question by looking at a range of factors, such as campaigning in the US election (Trish, 2018) or local government data projects (Durrant, Barnett, & Rempel, 2018). The goal is to unpack the mixture of big data applications and existing policy processes in order to understand whether these new tools and applications enhance or hinder policymaking….(More)”.