Data Governance Meets the EU AI Act


Article by Axel Schwanke: “..The EU AI Act emphasizes sustainable AI through robust data governance, promoting principles like data minimization, purpose limitation, and data quality to ensure responsible data collection and processing. It mandates measures such as data protection impact assessments and retention policies. Article 10 underscores the importance of effective data management in fostering ethical and sustainable AI development…This article states that high-risk AI systems must be developed using high-quality data sets for training, validation, and testing. These data sets should be managed properly, considering factors like data collection processes, data preparation, potential biases, and data gaps. The data sets should be relevant, representative, error-free, and complete as much as possible. They should also consider the specific context in which the AI system will be used. In some cases, providers may process special categories of personal data to detect and correct biases, but they must follow strict conditions to protect individuals’ rights and freedoms…

However, achieving compliance presents several significant challenges:

  • Ensuring Dataset Quality and Relevance: Organizations must establish robust data and AI platforms to prepare and manage datasets that are error-free, representative, and contextually relevant for their intended use cases. This requires rigorous data preparation and validation processes.
  • Bias and Contextual Sensitivity: Continuous monitoring for biases in data is critical. Organizations must implement corrective actions to address gaps while ensuring compliance with privacy regulations, especially when processing personal data to detect and reduce bias.
  • End-to-End Traceability: A comprehensive data governance framework is essential to track and document data flow from its origin to its final use in AI models. This ensures transparency, accountability, and compliance with regulatory requirements.
  • Evolving Data Requirements: Dynamic applications and changing schemas, particularly in industries like real estate, necessitate ongoing updates to data preparation processes to maintain relevance and accuracy.
  • Secure Data Processing: Compliance demands strict adherence to secure processing practices for personal data, ensuring privacy and security while enabling bias detection and mitigation.

Example: Real Estate Data
Immowelt’s real estate price map, awarded as the top performer in a 2022 test of real estate price maps, exemplifies the challenges of achieving high-quality datasets. The prepared data powers numerous services and applications, including data analysis, price predictions, personalization, recommendations, and market research…(More)”

Building Safer and Interoperable AI Systems


Essay by Vint Cerf: “While I am no expert on artificial intelligence (AI), I have some experience with the concept of agents. Thirty-five years ago, my colleague, Robert Kahn, and I explored the idea of knowledge robots (“knowbots” for short) in the context of digital libraries. In principle, a knowbot was a mobile piece of code that could move around the Internet, landing at servers, where they could execute tasks on behalf of users. The concept is mostly related to finding information and processing it on behalf of a user. We imagined that the knowbot code would land at a serving “knowbot hotel” where it would be given access to content and computing capability. The knowbots would be able to clone themselves to execute their objectives in parallel and would return to their origins bearing the results of their work. Modest prototypes were built in the pre-Web era.

In today’s world, artificially intelligent agents are now contemplated that can interact with each other and with information sources found on the Internet. For this to work, it’s my conjecture that a syntax and semantics will need to be developed and perhaps standardized to facilitate inter-agent interaction, agreements, and commitments for work to be performed, as well as a means for conveying results in reliable and unambiguous ways. A primary question for all such concepts starts with “What could possibly go wrong?”

In the context of AI applications and agents, work is underway to answer that question. I recently found one answer to that in the MLCommons AI Safety Working Group and its tool, AILuminate. My coarse sense of this is that AILuminate posts a large and widely varying collection of prompts—not unlike the notion of testing software by fuzzing—looking for inappropriate responses. Large language models (LLMs) can be tested and graded (that’s the hard part) on responses to a wide range of prompts. Some kind of overall safety metric might be established to connect one LLM to another. One might imagine query collections oriented toward exposing particular contextual weaknesses in LLMs. If these ideas prove useful, one could even imagine using them in testing services such as those at Underwriters Laboratory, now called UL Solutions. UL Solutions already offers software testing among its many other services.

LLMs as agents seem naturally attractive…(More)”.

Grant Guardian


About: “In the philanthropic sector, limited time and resources can make it challenging to thoroughly assess a nonprofit’s financial stability. Grant Guardian transforms weeks of financial analysis into hours of strategic insight–creating space for deep, meaningful engagement with partners while maintaining high grantmaking standards.

Introducing Grant Guardian

Grant Guardian is an AI-powered financial due diligence tool that streamlines the assessment process for both foundations and nonprofits. Foundations receive sophisticated financial health analyses and risk assessments, while nonprofits can simply submit their existing financial documents without the task of filling out multiple custom forms. This streamlined approach helps both parties focus on what matters most–their shared mission of creating impact.

How Does It Work?

Advanced AI Analyses: Grant Guardian harnesses the power of AI to analyze financial documents like 990s and audits, offering a comprehensive view of a nonprofit’s financial stability. With rapid data extraction and analysis based on modifiable criteria, Grant Guardian bolsters strategic funding with financial insights.

Customized Risk Reports: Grant Guardian’s risk reports and dashboards are customizable, allowing you to tailor metrics specifically to your organization’s funding priorities. This flexibility enables you to present clear, relevant data to stakeholders while maintaining a transparent audit trail for compliance.

Automated Data Extraction: As an enterprise-grade solution, Grant Guardian automates the extraction and analysis of data from financial reports, identifies potential risks, standardizes assessments, and minimizes user error from bias. This standardization is crucial, as nonprofits often vary in the financial documents they provide, making the due diligence process more complex and error-prone for funders…(More)”.

From social media to artificial intelligence: improving research on digital harms in youth


Article by Karen Mansfield, Sakshi Ghai, Thomas Hakman, Nick Ballou, Matti Vuorre, and Andrew K Przybylski: “…we critically evaluate the limitations and underlying challenges of existing research into the negative mental health consequences of internet-mediated technologies on young people. We argue that identifying and proactively addressing consistent shortcomings is the most effective method for building an accurate evidence base for the forthcoming influx of research on the effects of artificial intelligence (AI) on children and adolescents. Basic research, advice for caregivers, and evidence for policy makers should tackle the challenges that led to the misunderstanding of social media harms. The Personal View has four sections: first, we conducted a critical appraisal of recent reviews regarding effects of technology on children and adolescents’ mental health, aimed at identifying limitations in the evidence base; second, we discuss what we think are the most pressing methodological challenges underlying those limitations; third, we propose effective ways to address these limitations, building on robust methodology, with reference to emerging applications in the study of AI and children and adolescents’ wellbeing; and lastly, we articulate steps for conceptualising and rigorously studying the ever-shifting sociotechnological landscape of digital childhood and adolescence. We outline how the most effective approach to understanding how young people shape, and are shaped by, emerging technologies, is by identifying and directly addressing specific challenges. We present an approach grounded in interpreting findings through a coherent and collaborative evidence-based framework in a measured, incremental, and informative way…(More)”

The Case for Local and Regional Public Engagement in Governing Artificial Intelligence


Article by Stefaan Verhulst and Claudia Chwalisz: “As the Paris AI Action Summit approaches, the world’s attention will once again turn to the urgent questions surrounding how we govern artificial intelligence responsibly. Discussions will inevitably include calls for global coordination and participation, exemplified by several proposals for a Global Citizens’ Assembly on AI. While such initiatives aim to foster inclusivity, the reality is that meaningful deliberation and actionable outcomes often emerge most effectively at the local and regional levels.

Building on earlier reflections in “AI Globalism and AI Localism,” we argue that to govern AI for public benefit, we must prioritize building public engagement capacity closer to the communities where AI systems are deployed. Localized engagement not only ensures relevance to specific cultural, social, and economic contexts but also equips communities with the agency to shape both policy and product development in ways that reflect their needs and values.

While a Global Citizens’ Assembly sounds like a great idea on the surface, there is no public authority with teeth or enforcement mechanisms at that level of governance. The Paris Summit represents an opportunity to rethink existing AI governance frameworks, reorienting them toward an approach that is grounded in lived, local realities and mutually respectful processes of co-creation. Toward that end, we elaborate below on proposals for: local and regional AI assemblies; AI citizens’ assemblies for EU policy; capacity-building programs, and localized data governance models…(More)”.

Reimagining data for Open Source AI: A call to action


Report by Open Source Initiative: “Artificial intelligence (AI) is changing the world at a remarkable pace, with Open Source AI playing a pivotal role in shaping its trajectory. Yet, as AI advances, a fundamental challenge emerges: How do we create a data ecosystem that is not only robust but also equitable and sustainable?

The Open Source Initiative (OSI) and Open Future have taken a significant step toward addressing this challenge by releasing a white paper: “Data Governance in Open Source AI: Enabling Responsible and Systematic Access.” This document is the culmination of a global co-design process, enriched by insights from a vibrant two-day workshop held in Paris in October 2024….

The white paper offers a blueprint for a data ecosystem rooted in fairness, inclusivity and sustainability. It calls for two transformative shifts:

  1. From Open Data to Data Commons: Moving beyond the notion of unrestricted data to a model that balances openness with the rights and needs of all stakeholders.
  2. Broadening the stakeholder universe: Creating collaborative frameworks that unite communities, stewards and creators in equitable data-sharing practices.

To bring these shifts to life, the white paper delves into six critical focus areas:

  • Data preparation
  • Preference signaling and licensing
  • Data stewards and custodians
  • Environmental sustainability
  • Reciprocity and compensation
  • Policy interventions…(More)”

To Bot or Not to Bot? How AI Companions Are Reshaping Human Services and Connection


Essay by Julia Freeland Fisher: “Last year, a Harvard study on chatbots drew a startling conclusion: AI companions significantly reduce loneliness. The researchers found that “synthetic conversation partners,” or bots engineered to be caring and friendly, curbed loneliness on par with interacting with a fellow human. The study was silent, however, on the irony behind these findings: synthetic interaction is not a real, lasting connection. Should the price of curing loneliness really be more isolation?

Missing that subtext is emblematic of our times. Near-term upsides often overshadow long-term consequences. Even with important lessons learned about the harms of social media and big tech over the past two decades, today, optimism about AI’s potential is soaring, at least in some circles.

Bots present an especially tempting fix to long-standing capacity constraints across education, health care, and other social services. AI coaches, tutors, navigators, caseworkers, and assistants could overcome the very real challenges—like cost, recruitment, training, and retention—that have made access to vital forms of high-quality human support perennially hard to scale.

But scaling bots that simulate human support presents new risks. What happens if, across a wide range of “human” services, we trade access to more services for fewer human connections?…(More)”.

Towards Best Practices for Open Datasets for LLM Training


Paper by Stefan Baack et al: “Many AI companies are training their large language models (LLMs) on data without the permission of the copyright owners. The permissibility of doing so varies by jurisdiction: in countries like the EU and Japan, this is allowed under certain restrictions, while in the United States, the legal landscape is more ambiguous. Regardless of the legal status, concerns from creative producers have led to several high-profile copyright lawsuits, and the threat of litigation is commonly cited as a reason for the recent trend towards minimizing the information shared about training datasets by both corporate and public interest actors. This trend in limiting data information causes harm by hindering transparency, accountability, and innovation in the broader ecosystem by denying researchers, auditors, and impacted individuals access to the information needed to understand AI models.
While this could be mitigated by training language models on open access and public domain data, at the time of writing, there are no such models (trained at a meaningful scale) due to the substantial technical and sociological challenges in assembling the necessary corpus. These challenges include incomplete and unreliable metadata, the cost and complexity of digitizing physical records, and the diverse set of legal and technical skills required to ensure relevance and responsibility in a quickly changing landscape. Building towards a future where AI systems can be trained on openly licensed data that is responsibly curated and governed requires collaboration across legal, technical, and policy domains, along with investments in metadata standards, digitization, and fostering a culture of openness…(More)”.

Beware the Intention Economy: Collection and Commodification of Intent via Large Language Models


Article by Yaqub Chaudhary and Jonnie Penn: “The rapid proliferation of large language models (LLMs) invites the possibility of a new marketplace for behavioral and psychological data that signals intent. This brief article introduces some initial features of that emerging marketplace. We survey recent efforts by tech executives to position the capture, manipulation, and commodification of human intentionality as a lucrative parallel to—and viable extension of—the now-dominant attention economy, which has bent consumer, civic, and media norms around users’ finite attention spans since the 1990s. We call this follow-on the intention economy. We characterize it in two ways. First, as a competition, initially, between established tech players armed with the infrastructural and data capacities needed to vie for first-mover advantage on a new frontier of persuasive technologies. Second, as a commodification of hitherto unreachable levels of explicit and implicit data that signal intent, namely those signals borne of combining (a) hyper-personalized manipulation via LLM-based sycophancy, ingratiation, and emotional infiltration and (b) increasingly detailed categorization of online activity elicited through natural language.

This new dimension of automated persuasion draws on the unique capabilities of LLMs and generative AI more broadly, which intervene not only on what users want, but also, to cite Williams, “what they want to want” (Williams, 2018, p. 122). We demonstrate through a close reading of recent technical and critical literature (including unpublished papers from ArXiv) that such tools are already being explored to elicit, infer, collect, record, understand, forecast, and ultimately manipulate, modulate, and commodify human plans and purposes, both mundane (e.g., selecting a hotel) and profound (e.g., selecting a political candidate)…(More)”.

Nearly all Americans use AI, though most dislike it, poll shows


Axios: “The vast majority of Americans use products that involve AI, but their views of the technology remain overwhelmingly negative, according to a Gallup-Telescope survey published Wednesday.

Why it matters: The rapid advancement of generative AI threatens to have far-reaching consequences for Americans’ everyday lives, including reshaping the job marketimpacting elections, and affecting the health care industry.

The big picture: An estimated 99% of Americans used at least one AI-enabled product in the past week, but nearly two-thirds didn’t realize they were doing so, according to the poll’s findings.

  • These products included navigation apps, personal virtual assistants, weather forecasting apps, streaming services, shopping websites and social media platforms.
  • Ellyn Maese, a senior research consultant at Gallup, told Axios that the disconnect is because there is “a lot of confusion when it comes to what is just a computer program versus what is truly AI and intelligent.”

Zoom in: Despite its prevalent use, Americans’ views of AI remain overwhelmingly bleak, the survey found.

  • 72% of those surveyed had a “somewhat” or “very” negative opinion of how AI would impact the spread of false information, while 64% said the same about how it affects social connections.
  • The only area where a majority of Americans (61%) had a positive view of AI’s impact was regarding how it might help medical diagnosis and treatment…

State of play: The survey found that 68% of Americans believe the government and businesses equally bear responsibility for addressing the spread of false information related to AI.

  • 63% said the same about personal data privacy violations.
  • Majorities of those surveyed felt the same about combatting the unauthorized use of individuals’ likenesses (62%) and AI’s impact on job losses (52%).
  • In fact, the only area where Americans felt differently was when it came to national security threats; 62% of those surveyed said the government bore primary responsibility for reducing such threats…(More).”