Big data for decision-making in public transport management: A comparison of different data sources


Paper by Valeria Maria Urbano, Marika Arena, and Giovanni Azzone: “The conventional data used to support public transport management have inherent constraints related to scalability, cost, and the potential to capture space and time variability. These limitations underscore the importance of exploring innovative data sources to complement more traditional ones.

For public transport operators, who are tasked with making pivotal decisions spanning planning, operation, and performance measurement, innovative data sources are a frontier that is still largely unexplored. To fill this gap, this study first establishes a framework for evaluating innovative data sources, highlighting the specific characteristics that data should have to support decision-making in the context of transportation management. Second, a comparative analysis is conducted, using empirical data collected from primary public transport operators in the Lombardy region, with the aim of understanding whether and to what extent different data sources meet the above requirements.

The findings of this study support transport operators in selecting data sources aligned with different decision-making domains, highlighting related benefits and challenges. This underscores the importance of integrating different data sources to exploit their complementarities…(More)”.

Overcoming challenges associated with broad sharing of human genomic data


Paper by Jonathan E. LoTempio Jr & Jonathan D. Moreno: “Since the Human Genome Project, the consensus position in genomics has been that data should be shared widely to achieve the greatest societal benefit. This position relies on imprecise definitions of the concept of ‘broad data sharing’. Accordingly, the implementation of data sharing varies among landmark genomic studies. In this Perspective, we identify definitions of broad that have been used interchangeably, despite their distinct implications. We further offer a framework with clarified concepts for genomic data sharing and probe six examples in genomics that produced public data. Finally, we articulate three challenges. First, we explore the need to reinterpret the limits of general research use data. Second, we consider the governance of public data deposition from extant samples. Third, we ask whether, in light of changing concepts of broad, participants should be encouraged to share their status as participants publicly or not. Each of these challenges is followed with recommendations…(More)”.

Smart cities: the data to decisions process


Paper by Eve Tsybina et al: “Smart cities improve citizen services by converting data into data-driven decisions. This conversion is not coincidental and depends on the underlying movement of information through four layers: devices, data communication and handling, operations, and planning and economics. Here we examine how this flow of information enables smartness in five major infrastructure sectors: transportation, energy, health, governance and municipal utilities. We show how success or failure within and between layers results in disparities in city smartness across different regions and sectors. Regions such as Europe and Asia exhibit higher levels of smartness compared to Africa and the USA. Furthermore, within one region, such as the USA or the Middle East, smarter cities manage the flow of information more efficiently. Sectors such as transportation and municipal utilities, characterized by extensive data, strong analytics and efficient information flow, tend to be smarter than healthcare and energy. The flow of information, however, generates risks associated with data collection and artificial intelligence deployment at each layer. We underscore the importance of seamless data transformation in achieving cost-effective and sustainable urban improvements and identify both supportive and impeding factors in the journey towards smarter cities…(More)”.

Towards Best Practices for Open Datasets for LLM Training


Paper by Stefan Baack et al: “Many AI companies are training their large language models (LLMs) on data without the permission of the copyright owners. The permissibility of doing so varies by jurisdiction: in countries like the EU and Japan, this is allowed under certain restrictions, while in the United States, the legal landscape is more ambiguous. Regardless of the legal status, concerns from creative producers have led to several high-profile copyright lawsuits, and the threat of litigation is commonly cited as a reason for the recent trend towards minimizing the information shared about training datasets by both corporate and public interest actors. This trend in limiting data information causes harm by hindering transparency, accountability, and innovation in the broader ecosystem by denying researchers, auditors, and impacted individuals access to the information needed to understand AI models.
While this could be mitigated by training language models on open access and public domain data, at the time of writing, there are no such models (trained at a meaningful scale) due to the substantial technical and sociological challenges in assembling the necessary corpus. These challenges include incomplete and unreliable metadata, the cost and complexity of digitizing physical records, and the diverse set of legal and technical skills required to ensure relevance and responsibility in a quickly changing landscape. Building towards a future where AI systems can be trained on openly licensed data that is responsibly curated and governed requires collaboration across legal, technical, and policy domains, along with investments in metadata standards, digitization, and fostering a culture of openness…(More)”.

Beware the Intention Economy: Collection and Commodification of Intent via Large Language Models


Article by Yaqub Chaudhary and Jonnie Penn: “The rapid proliferation of large language models (LLMs) invites the possibility of a new marketplace for behavioral and psychological data that signals intent. This brief article introduces some initial features of that emerging marketplace. We survey recent efforts by tech executives to position the capture, manipulation, and commodification of human intentionality as a lucrative parallel to—and viable extension of—the now-dominant attention economy, which has bent consumer, civic, and media norms around users’ finite attention spans since the 1990s. We call this follow-on the intention economy. We characterize it in two ways. First, as a competition, initially, between established tech players armed with the infrastructural and data capacities needed to vie for first-mover advantage on a new frontier of persuasive technologies. Second, as a commodification of hitherto unreachable levels of explicit and implicit data that signal intent, namely those signals borne of combining (a) hyper-personalized manipulation via LLM-based sycophancy, ingratiation, and emotional infiltration and (b) increasingly detailed categorization of online activity elicited through natural language.

This new dimension of automated persuasion draws on the unique capabilities of LLMs and generative AI more broadly, which intervene not only on what users want, but also, to cite Williams, “what they want to want” (Williams, 2018, p. 122). We demonstrate through a close reading of recent technical and critical literature (including unpublished papers from ArXiv) that such tools are already being explored to elicit, infer, collect, record, understand, forecast, and ultimately manipulate, modulate, and commodify human plans and purposes, both mundane (e.g., selecting a hotel) and profound (e.g., selecting a political candidate)…(More)”.

Governing artificial intelligence means governing data: (re)setting the agenda for data justice


Paper by Linnet Taylor, Siddharth Peter de Souza, Aaron Martin, and Joan López Solano: “The field of data justice has been evolving to take into account the role of data in powering the field of artificial intelligence (AI). In this paper we review the main conceptual bases for governing data and AI: the market-based approach, the personal–non-personal data distinction and strategic sovereignty. We then analyse how these are being operationalised into practical models for governance, including public data trusts, data cooperatives, personal data sovereignty, data collaboratives, data commons approaches and indigenous data sovereignty. We interrogate these models’ potential for just governance based on four benchmarks which we propose as a reformulation of the Data Justice governance agenda identified by Taylor in her 2017 framework. Re-situating data justice at the intersection of data and AI, these benchmarks focus on preserving and strengthening public infrastructures and public goods; inclusiveness; contestability and accountability; and global responsibility. We demonstrate how they can be used to test whether a governance approach will succeed in redistributing power, engaging with public concerns and creating a plural politics of AI…(More)”.

Boosting: Empowering Citizens with Behavioral Science


Paper by Stefan M. Herzog and Ralph Hertwig: “…Behavioral public policy came to the fore with the introduction of nudging, which aims to steer behavior while maintaining freedom of choice. Responding to critiques of nudging (e.g., that it does not promote agency and relies on benevolent choice architects), other behavioral policy approaches focus on empowering citizens. Here we review boosting, a behavioral policy approach that aims to foster people’s agency, self-control, and ability to make informed decisions. It is grounded in evidence from behavioral science showing that human decision making is not as notoriously flawed as the nudging approach assumes. We argue that addressing the challenges of our time—such as climate change, pandemics, and the threats to liberal democracies and human autonomy posed by digital technologies and choice architectures—calls for fostering capable and engaged citizens as a first line of response to complement slower, systemic approaches…(More)”.

Data sharing restrictions are hampering precision health in the European Union


Paper by Cristina Legido-Quigley et al: “Contemporary healthcare is undergoing a transition, shifting from a population-based approach to personalized medicine on an individual level. In October 2023, the European Partnership for Personalized Medicine was officially launched to communicate the benefits of this approach to citizens and healthcare systems in member countries. The main debate revolves around the inconsistency in regulatory changes within personal data access and its potential commercialization. Moreover, the lack of unified consensus within European Union (EU) countries is leading to problems with data sharing to progress personalized medicine. Here we discuss the integration of biological data with personal information on a European scale for the advancement of personalized medicine, raising legal considerations of data protection under the EU General Data Protection Regulation (GDPR)…(More)”.

Governance of Indigenous data in open earth systems science


Paper by Lydia Jennings et al: “In the age of big data and open science, what processes are needed to follow open science protocols while upholding Indigenous Peoples’ rights? The Earth Data Relations Working Group (EDRWG), convened to address this question and envision a research landscape that acknowledges the legacy of extractive practices and embraces new norms across Earth science institutions and open science research. Using the National Ecological Observatory Network (NEON) as an example, the EDRWG recommends actions, applicable across all phases of the data lifecycle, that recognize the sovereign rights of Indigenous Peoples and support better research across all Earth Sciences…(More)”

Facing & mitigating common challenges when working with real-world data: The Data Learning Paradigm


Paper by Jake Lever et al: “The rapid growth of data-driven applications is ubiquitous across virtually all scientific domains, and has led to an increasing demand for effective methods to handle data deficiencies and mitigate the effects of imperfect data. This paper presents a guide for researchers encountering real-world data-driven applications, and the respective challenges associated with this. This article proposes the concept of the Data Learning Paradigm, combining the principles of machine learning, data science and data assimilation to tackle real-world challenges in data-driven applications. Models are a product of the data upon which they are trained, and no data collected from real world scenarios is perfect due to natural limitations of sensing and collection. Thus, computational modelling of real world systems is intrinsically limited by the various deficiencies encountered in real data. The Data Learning Paradigm aims to leverage the strengths of data improvement to enhance the accuracy, reliability, and interpretability of data-driven models. We outline a range of methods which are currently being implemented in the field of Data Learning involving machine learning and data science methods, and discuss how these mitigate the various problems associated with data-driven models, illustrating improved results in a multitude of real world applications. We highlight examples where these methods have led to significant advancements in fields such as environmental monitoring, planetary exploration, healthcare analytics, linguistic analysis, social networks, and smart manufacturing. We offer a guide to how these methods may be implemented to deal with general types of limitations in data, alongside their current and potential applications…(More)”.