Big data for decision-making in public transport management: A comparison of different data sources


Paper by Valeria Maria Urbano, Marika Arena, and Giovanni Azzone: “The conventional data used to support public transport management have inherent constraints related to scalability, cost, and the potential to capture space and time variability. These limitations underscore the importance of exploring innovative data sources to complement more traditional ones.

For public transport operators, who are tasked with making pivotal decisions spanning planning, operation, and performance measurement, innovative data sources are a frontier that is still largely unexplored. To fill this gap, this study first establishes a framework for evaluating innovative data sources, highlighting the specific characteristics that data should have to support decision-making in the context of transportation management. Second, a comparative analysis is conducted, using empirical data collected from primary public transport operators in the Lombardy region, with the aim of understanding whether and to what extent different data sources meet the above requirements.

The findings of this study support transport operators in selecting data sources aligned with different decision-making domains, highlighting related benefits and challenges. This underscores the importance of integrating different data sources to exploit their complementarities…(More)”.

Developing a Framework for Collective Data Rights


Report by Jeni Tennison: “Are collective data rights really necessary? Or, do people and communities already have sufficient rights to address harms through equality, public administration or consumer law? Might collective data rights even be harmful by undermining individual data rights or creating unjust collectivities? If we did have collective data rights, what should they look like? And how could they be introduced into legislation?

Data protection law and policy are founded on the notion of individual notice and consent, originating from the handling of personal data gathered for medical and scientific research. However, recent work on data governance has highlighted shortcomings with the notice-and-consent approach, especially in an age of big data and artificial intelligence. This special reports considers the need for collective data rights by examining legal remedies currently available in the United Kingdom in three scenarios where the people affected by algorithmic decision making are not data subjects and therefore do not have individual data protection rights…(More)”.

The Case for Local and Regional Public Engagement in Governing Artificial Intelligence


Article by Stefaan Verhulst and Claudia Chwalisz: “As the Paris AI Action Summit approaches, the world’s attention will once again turn to the urgent questions surrounding how we govern artificial intelligence responsibly. Discussions will inevitably include calls for global coordination and participation, exemplified by several proposals for a Global Citizens’ Assembly on AI. While such initiatives aim to foster inclusivity, the reality is that meaningful deliberation and actionable outcomes often emerge most effectively at the local and regional levels.

Building on earlier reflections in “AI Globalism and AI Localism,” we argue that to govern AI for public benefit, we must prioritize building public engagement capacity closer to the communities where AI systems are deployed. Localized engagement not only ensures relevance to specific cultural, social, and economic contexts but also equips communities with the agency to shape both policy and product development in ways that reflect their needs and values.

While a Global Citizens’ Assembly sounds like a great idea on the surface, there is no public authority with teeth or enforcement mechanisms at that level of governance. The Paris Summit represents an opportunity to rethink existing AI governance frameworks, reorienting them toward an approach that is grounded in lived, local realities and mutually respectful processes of co-creation. Toward that end, we elaborate below on proposals for: local and regional AI assemblies; AI citizens’ assemblies for EU policy; capacity-building programs, and localized data governance models…(More)”.

Reimagining data for Open Source AI: A call to action


Report by Open Source Initiative: “Artificial intelligence (AI) is changing the world at a remarkable pace, with Open Source AI playing a pivotal role in shaping its trajectory. Yet, as AI advances, a fundamental challenge emerges: How do we create a data ecosystem that is not only robust but also equitable and sustainable?

The Open Source Initiative (OSI) and Open Future have taken a significant step toward addressing this challenge by releasing a white paper: “Data Governance in Open Source AI: Enabling Responsible and Systematic Access.” This document is the culmination of a global co-design process, enriched by insights from a vibrant two-day workshop held in Paris in October 2024….

The white paper offers a blueprint for a data ecosystem rooted in fairness, inclusivity and sustainability. It calls for two transformative shifts:

  1. From Open Data to Data Commons: Moving beyond the notion of unrestricted data to a model that balances openness with the rights and needs of all stakeholders.
  2. Broadening the stakeholder universe: Creating collaborative frameworks that unite communities, stewards and creators in equitable data-sharing practices.

To bring these shifts to life, the white paper delves into six critical focus areas:

  • Data preparation
  • Preference signaling and licensing
  • Data stewards and custodians
  • Environmental sustainability
  • Reciprocity and compensation
  • Policy interventions…(More)”

To Bot or Not to Bot? How AI Companions Are Reshaping Human Services and Connection


Essay by Julia Freeland Fisher: “Last year, a Harvard study on chatbots drew a startling conclusion: AI companions significantly reduce loneliness. The researchers found that “synthetic conversation partners,” or bots engineered to be caring and friendly, curbed loneliness on par with interacting with a fellow human. The study was silent, however, on the irony behind these findings: synthetic interaction is not a real, lasting connection. Should the price of curing loneliness really be more isolation?

Missing that subtext is emblematic of our times. Near-term upsides often overshadow long-term consequences. Even with important lessons learned about the harms of social media and big tech over the past two decades, today, optimism about AI’s potential is soaring, at least in some circles.

Bots present an especially tempting fix to long-standing capacity constraints across education, health care, and other social services. AI coaches, tutors, navigators, caseworkers, and assistants could overcome the very real challenges—like cost, recruitment, training, and retention—that have made access to vital forms of high-quality human support perennially hard to scale.

But scaling bots that simulate human support presents new risks. What happens if, across a wide range of “human” services, we trade access to more services for fewer human connections?…(More)”.

Overcoming challenges associated with broad sharing of human genomic data


Paper by Jonathan E. LoTempio Jr & Jonathan D. Moreno: “Since the Human Genome Project, the consensus position in genomics has been that data should be shared widely to achieve the greatest societal benefit. This position relies on imprecise definitions of the concept of ‘broad data sharing’. Accordingly, the implementation of data sharing varies among landmark genomic studies. In this Perspective, we identify definitions of broad that have been used interchangeably, despite their distinct implications. We further offer a framework with clarified concepts for genomic data sharing and probe six examples in genomics that produced public data. Finally, we articulate three challenges. First, we explore the need to reinterpret the limits of general research use data. Second, we consider the governance of public data deposition from extant samples. Third, we ask whether, in light of changing concepts of broad, participants should be encouraged to share their status as participants publicly or not. Each of these challenges is followed with recommendations…(More)”.

Smart cities: the data to decisions process


Paper by Eve Tsybina et al: “Smart cities improve citizen services by converting data into data-driven decisions. This conversion is not coincidental and depends on the underlying movement of information through four layers: devices, data communication and handling, operations, and planning and economics. Here we examine how this flow of information enables smartness in five major infrastructure sectors: transportation, energy, health, governance and municipal utilities. We show how success or failure within and between layers results in disparities in city smartness across different regions and sectors. Regions such as Europe and Asia exhibit higher levels of smartness compared to Africa and the USA. Furthermore, within one region, such as the USA or the Middle East, smarter cities manage the flow of information more efficiently. Sectors such as transportation and municipal utilities, characterized by extensive data, strong analytics and efficient information flow, tend to be smarter than healthcare and energy. The flow of information, however, generates risks associated with data collection and artificial intelligence deployment at each layer. We underscore the importance of seamless data transformation in achieving cost-effective and sustainable urban improvements and identify both supportive and impeding factors in the journey towards smarter cities…(More)”.

Towards Best Practices for Open Datasets for LLM Training


Paper by Stefan Baack et al: “Many AI companies are training their large language models (LLMs) on data without the permission of the copyright owners. The permissibility of doing so varies by jurisdiction: in countries like the EU and Japan, this is allowed under certain restrictions, while in the United States, the legal landscape is more ambiguous. Regardless of the legal status, concerns from creative producers have led to several high-profile copyright lawsuits, and the threat of litigation is commonly cited as a reason for the recent trend towards minimizing the information shared about training datasets by both corporate and public interest actors. This trend in limiting data information causes harm by hindering transparency, accountability, and innovation in the broader ecosystem by denying researchers, auditors, and impacted individuals access to the information needed to understand AI models.
While this could be mitigated by training language models on open access and public domain data, at the time of writing, there are no such models (trained at a meaningful scale) due to the substantial technical and sociological challenges in assembling the necessary corpus. These challenges include incomplete and unreliable metadata, the cost and complexity of digitizing physical records, and the diverse set of legal and technical skills required to ensure relevance and responsibility in a quickly changing landscape. Building towards a future where AI systems can be trained on openly licensed data that is responsibly curated and governed requires collaboration across legal, technical, and policy domains, along with investments in metadata standards, digitization, and fostering a culture of openness…(More)”.

Beware the Intention Economy: Collection and Commodification of Intent via Large Language Models


Article by Yaqub Chaudhary and Jonnie Penn: “The rapid proliferation of large language models (LLMs) invites the possibility of a new marketplace for behavioral and psychological data that signals intent. This brief article introduces some initial features of that emerging marketplace. We survey recent efforts by tech executives to position the capture, manipulation, and commodification of human intentionality as a lucrative parallel to—and viable extension of—the now-dominant attention economy, which has bent consumer, civic, and media norms around users’ finite attention spans since the 1990s. We call this follow-on the intention economy. We characterize it in two ways. First, as a competition, initially, between established tech players armed with the infrastructural and data capacities needed to vie for first-mover advantage on a new frontier of persuasive technologies. Second, as a commodification of hitherto unreachable levels of explicit and implicit data that signal intent, namely those signals borne of combining (a) hyper-personalized manipulation via LLM-based sycophancy, ingratiation, and emotional infiltration and (b) increasingly detailed categorization of online activity elicited through natural language.

This new dimension of automated persuasion draws on the unique capabilities of LLMs and generative AI more broadly, which intervene not only on what users want, but also, to cite Williams, “what they want to want” (Williams, 2018, p. 122). We demonstrate through a close reading of recent technical and critical literature (including unpublished papers from ArXiv) that such tools are already being explored to elicit, infer, collect, record, understand, forecast, and ultimately manipulate, modulate, and commodify human plans and purposes, both mundane (e.g., selecting a hotel) and profound (e.g., selecting a political candidate)…(More)”.

Good government data requires good statistics officials – but how motivated and competent are they?


Worldbank Blog: “Government data is only as reliable as the statistics officials who produce it. Yet, surprisingly little is known about these officials themselves. For decades, they have diligently collected data on others –  such as households and firms – to generate official statistics, from poverty rates to inflation figures. Yet, data about statistics officials themselves is missing. How competent are they at analyzing statistical data? How motivated are they to excel in their roles? Do they uphold integrity when producing official statistics, even in the face of opposing career incentives or political pressures? And what can National Statistical Offices (NSOs) do to cultivate a workforce that is competent, motivated, and ethical?

We surveyed 13,300 statistics officials in 14 countries in Latin America and the Caribbean to find out. Five results stand out. For further insights, consult our Inter-American Development Bank (IDB) report, Making National Statistical Offices Work Better.

1. The competence and management of statistics officials shape the quality of statistical data

Our survey included a short exam assessing basic statistical competencies, such as descriptive statistics and probability. Statistical competence correlates with data quality: NSOs with higher exam scores among employees tend to achieve better results in the World Bank’s Statistical Performance Indicators (r = 0.36).

NSOs with better management practices also have better statistical performance. For instance, NSOs with more robust recruitment and selection processes have better statistical performance (r = 0.62)…(More)”.