Market Power in Artificial Intelligence


Paper by Joshua S. Gans: “This paper surveys the relevant existing literature that can help researchers and policy makers understand the drivers of competition in markets that constitute the provision of artificial intelligence products. The focus is on three broad markets: training data, input data, and AI predictions. It is shown that a key factor in determining the emergence and persistence of market power will be the operation of markets for data that would allow for trading data across firm boundaries…(More)”.

Citizen silence: Missed opportunities in citizen science


Paper by Damon M Hall et al: “Citizen science is personal. Participation is contingent on the citizens’ connection to a topic or to interpersonal relationships meaningful to them. But from the peer-reviewed literature, scientists appear to have an acquisitive data-centered relationship with citizens. This has spurred ethical and pragmatic criticisms of extractive relationships with citizen scientists. We suggest five practical steps to shift citizen-science research from extractive to relational, reorienting the research process and providing reciprocal benefits to researchers and citizen scientists. By virtue of their interests and experience within their local environments, citizen scientists have expertise that, if engaged, can improve research methods and product design decisions. To boost the value of scientific outputs to society and participants, citizen-science research teams should rethink how they engage and value volunteers…(More)”.

Using online search activity for earlier detection of gynaecological malignancy


Paper by Jennifer F. Barcroft et al: Ovarian cancer is the most lethal and endometrial cancer the most common gynaecological cancer in the UK, yet neither have a screening program in place to facilitate early disease detection. The aim is to evaluate whether online search data can be used to differentiate between individuals with malignant and benign gynaecological diagnoses.

This is a prospective cohort study evaluating online search data in symptomatic individuals (Google user) referred from primary care (GP) with a suspected cancer to a London Hospital (UK) between December 2020 and June 2022. Informed written consent was obtained and online search data was extracted via Google takeout and anonymised. A health filter was applied to extract health-related terms for 24 months prior to GP referral. A predictive model (outcome: malignancy) was developed using (1) search queries (terms model) and (2) categorised search queries (categories model). Area under the ROC curve (AUC) was used to evaluate model performance. 844 women were approached, 652 were eligible to participate and 392 were recruited. Of those recruited, 108 did not complete enrollment, 12 withdrew and 37 were excluded as they did not track Google searches or had an empty search history, leaving a cohort of 235.s

The cohort had a median age of 53 years old (range 20–81) and a malignancy rate of 26.0%. There was a difference in online search data between those with a benign and malignant diagnosis, noted as early as 360 days in advance of GP referral, when search queries were used directly, but only 60 days in advance, when queries were divided into health categories. A model using online search data from patients (n = 153) who performed health-related search and corrected for sample size, achieved its highest sample-corrected AUC of 0.82, 60 days prior to GP referral.

Online search data appears to be different between individuals with malignant and benign gynaecological conditions, with a signal observed in advance of GP referral date. Online search data needs to be evaluated in a larger dataset to determine its value as an early disease detection tool and whether its use leads to improved clinical outcomes…(More)”.

Influence of public innovation laboratories on the development of public sector ambidexterity


Article by Christophe Favoreu et al: “Ambidexterity has become a major issue for public organizations as they manage increasingly strong contradictory pressures to optimize existing processes while innovating. Moreover, although public innovation laboratories are emerging, their influence on the development of ambidexterity remains largely unexplored. Our research aims to understand how innovation laboratories contribute to the formation of individual ambidexterity within the public sector. Drawing from three case studies, this research underscores the influence of these labs on public ambidexterity through the development of innovations by non-specialized actors and the deployment and reuse of innovative managerial practices and techniques outside the i-labs…(More)”.

A typology of artificial intelligence data work


Article by James Muldoon et al: “This article provides a new typology for understanding human labour integrated into the production of artificial intelligence systems through data preparation and model evaluation. We call these forms of labour ‘AI data work’ and show how they are an important and necessary element of the artificial intelligence production process. We draw on fieldwork with an artificial intelligence data business process outsourcing centre specialising in computer vision data, alongside a decade of fieldwork with microwork platforms, business process outsourcing, and artificial intelligence companies to help dispel confusion around the multiple concepts and frames that encompass artificial intelligence data work including ‘ghost work’, ‘microwork’, ‘crowdwork’ and ‘cloudwork’. We argue that these different frames of reference obscure important differences between how this labour is organised in different contexts. The article provides a conceptual division between the different types of artificial intelligence data work institutions and the different stages of what we call the artificial intelligence data pipeline. This article thus contributes to our understanding of how the practices of workers become a valuable commodity integrated into global artificial intelligence production networks…(More)”.

AI-enhanced Collective Intelligence: The State of the Art and Prospects


Paper by Hao Cui and Taha Yasseri: “The current societal challenges exceed the capacity of human individual or collective effort alone. As AI evolves, its role within human collectives is poised to vary from an assistive tool to a participatory member. Humans and AI possess complementary capabilities that, when synergized, can achieve a level of collective intelligence that surpasses the collective capabilities of either humans or AI in isolation. However, the interactions in human-AI systems are inherently complex, involving intricate processes and interdependencies. This review incorporates perspectives from network science to conceptualize a multilayer representation of human-AI collective intelligence, comprising a cognition layer, a physical layer, and an information layer. Within this multilayer network, humans and AI agents exhibit varying characteristics; humans differ in diversity from surface-level to deep-level attributes, while AI agents range in degrees of functionality and anthropomorphism. The interplay among these agents shapes the overall structure and dynamics of the system. We explore how agents’ diversity and interactions influence the system’s collective intelligence. Furthermore, we present an analysis of real-world instances of AI-enhanced collective intelligence. We conclude by addressing the potential challenges in AI-enhanced collective intelligence and offer perspectives on future developments in this field…(More)”.

Algorithmic attention rents: A theory of digital platform market power


Paper by Tim O’Reilly, Ilan Strauss and Mariana Mazzucato: “We outline a theory of algorithmic attention rents in digital aggregator platforms. We explore the way that as platforms grow, they become increasingly capable of extracting rents from a variety of actors in their ecosystems—users, suppliers, and advertisers—through their algorithmic control over user attention. We focus our analysis on advertising business models, in which attention harvested from users is monetized by reselling the attention to suppliers or other advertisers, though we believe the theory has relevance to other online business models as well. We argue that regulations should mandate the disclosure of the operating metrics that platforms use to allocate user attention and shape the “free” side of their marketplace, as well as details on how that attention is monetized…(More)”.

Making Sense of Citizens’ Input through Artificial Intelligence: A Review of Methods for Computational Text Analysis to Support the Evaluation of Contributions in Public Participation


Paper by Julia Romberg and Tobias Escher: “Public sector institutions that consult citizens to inform decision-making face the challenge of evaluating the contributions made by citizens. This evaluation has important democratic implications but at the same time, consumes substantial human resources. However, until now the use of artificial intelligence such as computer-supported text analysis has remained an under-studied solution to this problem. We identify three generic tasks in the evaluation process that could benefit from natural language processing (NLP). Based on a systematic literature search in two databases on computational linguistics and digital government, we provide a detailed review of existing methods and their performance. While some promising approaches exist, for instance to group data thematically and to detect arguments and opinions, we show that there remain important challenges before these could offer any reliable support in practice. These include the quality of results, the applicability to non-English language corpuses and making algorithmic models available to practitioners through software. We discuss a number of avenues that future research should pursue that can ultimately lead to solutions for practice. The most promising of these bring in the expertise of human evaluators, for example through active learning approaches or interactive topic modeling…(More)”.

Synthetic Data and the Future of AI


Paper by Peter Lee: “The future of artificial intelligence (AI) is synthetic. Several of the most prominent technical and legal challenges of AI derive from the need to amass huge amounts of real-world data to train machine learning (ML) models. Collecting such real-world data can be highly difficult and can threaten privacy, introduce bias in automated decision making, and infringe copyrights on a massive scale. This Article explores the emergence of a seemingly paradoxical technical creation that can mitigate—though not completely eliminate—these concerns: synthetic data. Increasingly, data scientists are using simulated driving environments, fabricated medical records, fake images, and other forms of synthetic data to train ML models. Artificial data, in other words, is being used to train artificial intelligence. Synthetic data offers a host of technical and legal benefits; it promises to radically decrease the cost of obtaining data, sidestep privacy issues, reduce automated discrimination, and avoid copyright infringement. Alongside such promise, however, synthetic data offers perils as well. Deficiencies in the development and deployment of synthetic data can exacerbate the dangers of AI and cause significant social harm.

In light of the enormous value and importance of synthetic data, this Article sketches the contours of an innovation ecosystem to promote its robust and responsible development. It identifies three objectives that should guide legal and policy measures shaping the creation of synthetic data: provisioning, disclosure, and democratization. Ideally, such an ecosystem should incentivize the generation of high-quality synthetic data, encourage disclosure of both synthetic data and processes for generating it, and promote multiple sources of innovation. This Article then examines a suite of “innovation mechanisms” that can advance these objectives, ranging from open source production to proprietary approaches based on patents, trade secrets, and copyrights. Throughout, it suggests policy and doctrinal reforms to enhance innovation, transparency, and democratic access to synthetic data. Just as AI will have enormous legal implications, law and policy can play a central role in shaping the future of AI…(More)”.

Prompting Diverse Ideas: Increasing AI Idea Variance


Paper by Lennart Meincke, Ethan Mollick, and Christian Terwiesch: “Unlike routine tasks where consistency is prized, in creativity and innovation the goal is to create a diverse set of ideas. This paper delves into the burgeoning interest in employing Artificial Intelligence (AI) to enhance the productivity and quality of the idea generation process. While previous studies have found that the average quality of AI ideas is quite high, prior research also has pointed to the inability of AI-based brainstorming to create sufficient dispersion of ideas, which limits novelty and the quality of the overall best idea. Our research investigates methods to increase the dispersion in AI-generated ideas. Using GPT-4, we explore the effect of different prompting methods on Cosine Similarity, the number of unique ideas, and the speed with which the idea space gets exhausted. We do this in the domain of developing a new product development for college students, priced under $50. In this context, we find that (1) pools of ideas generated by GPT-4 with various plausible prompts are less diverse than ideas generated by groups of human subjects (2) the diversity of AI generated ideas can be substantially improved using prompt engineering (3) Chain-of-Thought (CoT) prompting leads to the highest diversity of ideas of all prompts we evaluated and was able to come close to what is achieved by groups of human subjects. It also was capable of generating the highest number of unique ideas of any prompt we studied…(More)”