Building Safer and Interoperable AI Systems


Essay by Vint Cerf: “While I am no expert on artificial intelligence (AI), I have some experience with the concept of agents. Thirty-five years ago, my colleague, Robert Kahn, and I explored the idea of knowledge robots (“knowbots” for short) in the context of digital libraries. In principle, a knowbot was a mobile piece of code that could move around the Internet, landing at servers, where they could execute tasks on behalf of users. The concept is mostly related to finding information and processing it on behalf of a user. We imagined that the knowbot code would land at a serving “knowbot hotel” where it would be given access to content and computing capability. The knowbots would be able to clone themselves to execute their objectives in parallel and would return to their origins bearing the results of their work. Modest prototypes were built in the pre-Web era.

In today’s world, artificially intelligent agents are now contemplated that can interact with each other and with information sources found on the Internet. For this to work, it’s my conjecture that a syntax and semantics will need to be developed and perhaps standardized to facilitate inter-agent interaction, agreements, and commitments for work to be performed, as well as a means for conveying results in reliable and unambiguous ways. A primary question for all such concepts starts with “What could possibly go wrong?”

In the context of AI applications and agents, work is underway to answer that question. I recently found one answer to that in the MLCommons AI Safety Working Group and its tool, AILuminate. My coarse sense of this is that AILuminate posts a large and widely varying collection of prompts—not unlike the notion of testing software by fuzzing—looking for inappropriate responses. Large language models (LLMs) can be tested and graded (that’s the hard part) on responses to a wide range of prompts. Some kind of overall safety metric might be established to connect one LLM to another. One might imagine query collections oriented toward exposing particular contextual weaknesses in LLMs. If these ideas prove useful, one could even imagine using them in testing services such as those at Underwriters Laboratory, now called UL Solutions. UL Solutions already offers software testing among its many other services.

LLMs as agents seem naturally attractive…(More)”.

Why Digital Public Goods, including AI, Should Depend on Open Data


Article by Cable Green: “Acknowledging that some data should not be shared (for moral, ethical and/or privacy reasons) and some cannot be shared (for legal or other reasons), Creative Commons (CC) thinks there is value in incentivizing the creation, sharing, and use of open data to advance knowledge production. As open communities continue to imagine, design, and build digital public goods and public infrastructure services for education, science, and culture, these goods and services – whenever possible and appropriate – should produce, share, and/or build upon open data.

Open Data and Digital Public Goods (DPGs)

CC is a member of the Digital Public Goods Alliance (DPGA) and CC’s legal tools have been recognized as digital public goods (DPGs). DPGs are “open-source software, open standards, open data, open AI systems, and open content collections that adhere to privacy and other applicable best practices, do no harm, and are of high relevance for attainment of the United Nations 2030 Sustainable Development Goals (SDGs).” If we want to solve the world’s greatest challenges, governments and other funders will need to invest in, develop, openly license, share, and use DPGs.

Open data is important to DPGs because data is a key driver of economic vitality with demonstrated potential to serve the public good. In the public sector, data informs policy making and public services delivery by helping to channel scarce resources to those most in need; providing the means to hold governments accountable and foster social innovation. In short, data has the potential to improve people’s lives. When data is closed or otherwise unavailable, the public does not accrue these benefits.CC was recently part of a DPGA sub-committee working to preserve the integrity of open data as part of the DPG Standard. This important update to the DPG Standard was introduced to ensure only open datasets and content collections with open licenses are eligible for recognition as DPGs. This new requirement means open data sets and content collections must meet the following criteria to be recognised as a digital public good.

  1. Comprehensive Open Licensing:
    1. The entire data set/content collection must be under an acceptable open licence. Mixed-licensed collections will no longer be accepted.
  2. Accessible and Discoverable:
    1. All data sets and content collection DPGs must be openly licensed and easily accessible from a distinct, single location, such as a unique URL.
  3. Permitted Access Restrictions:
    1. Certain access restrictions – such as logins, registrations, API keys, and throttling – are permitted as long as they do not discriminate against users or restrict usage based on geography or any other factors…(More)”.

Mindmasters: The Data-Driven Science of Predicting and Changing Human Behavior


Book by Sandra Matz: “There are more pieces of digital data than there are stars in the universe. This data helps us monitor our planet, decipher our genetic code, and take a deep dive into our psychology.

As algorithms become increasingly adept at accessing the human mind, they also become more and more powerful at controlling it, enticing us to buy a certain product or vote for a certain political candidate. Some of us say this technological trend is no big deal. Others consider it one of the greatest threats to humanity. But what if the truth is more nuanced and mind-bending than that?

In Mindmasters, Columbia Business School professor Sandra Matz reveals in fascinating detail how big data offers insights into the most intimate aspects of our psyches and how these insights empower an external influence over the choices we make. This can be creepy, manipulative, and downright harmful, with scandals like that of British consulting firm Cambridge Analytica being merely the tip of the iceberg. Yet big data also holds enormous potential to help us live healthier, happier lives—for example, by improving our mental health, encouraging better financial decisions, or enabling us to break out of our echo chambers..(More)”.

Silencing Science Tracker


About: “The Silencing Science Tracker is a joint initiative of the Sabin Center for Climate Change Law and the Climate Science Legal Defense Fund. It is intended to record reports of federal, state, and local government attempts to “silence science” since the November 2016 election.

We define “silencing science” to include any action that has the effect of restricting or prohibiting scientific research, education, or discussion, or the publication or use of scientific information. We divide such actions into 7 categories as follows…(More)”

CategoryExamples
Government CensorshipChanging the content of websites and documents to suppress or distort scientific information.Making scientific data more difficult to find or access.Restricting public communication by scientists.
Self-CensorshipScientists voluntarily changing the content of websites and documents to suppress or distort scientific information, potentially in response to political pressure.
 We note that it is often difficult to determine whether self-censorship is occurring and/or its cause. We do not take any position on the accuracy of any individual report on self-censorship.
Budget CutsReducing funding for existing agency programs involving scientific research or scientific education.Cancelling existing grants for scientific research or scientific education.
 We do not include, in the “budget cuts” category, government decisions to refuse new grant applications or funding for new agency programs.
Personnel ChangesRemoving scientists from agency positions or creating a hostile work environment.Appointing unqualified individuals to, or failing to fill, scientific positions.Changing the composition of scientific advisory board or other bodies to remove qualified scientists or add only industry-favored members.Eliminating government bodies involved in scientific research or education or the dissemination of scientific information.
Research HindranceDestroying data needed to undertake scientific research.Preventing or restricting the publication of scientific research.Pressuring scientists to change research findings.
Bias and MisrepresentationEngaging in “cherry picking” or only disclosing certain scientific studies (e.g., that support a particular conclusion).Misrepresenting or mischaracterizing scientific studies.Disregarding scientific studies or advice in policy-making.
Interference with EducationChanging science education standards to prevent or limit the teaching of proven scientific theories.Requiring or encouraging the teaching of discredited or unproven scientific theories.Preventing the use of factually accurate textbooks and other instructional materials (e.g., on religious grounds).

Grant Guardian


About: “In the philanthropic sector, limited time and resources can make it challenging to thoroughly assess a nonprofit’s financial stability. Grant Guardian transforms weeks of financial analysis into hours of strategic insight–creating space for deep, meaningful engagement with partners while maintaining high grantmaking standards.

Introducing Grant Guardian

Grant Guardian is an AI-powered financial due diligence tool that streamlines the assessment process for both foundations and nonprofits. Foundations receive sophisticated financial health analyses and risk assessments, while nonprofits can simply submit their existing financial documents without the task of filling out multiple custom forms. This streamlined approach helps both parties focus on what matters most–their shared mission of creating impact.

How Does It Work?

Advanced AI Analyses: Grant Guardian harnesses the power of AI to analyze financial documents like 990s and audits, offering a comprehensive view of a nonprofit’s financial stability. With rapid data extraction and analysis based on modifiable criteria, Grant Guardian bolsters strategic funding with financial insights.

Customized Risk Reports: Grant Guardian’s risk reports and dashboards are customizable, allowing you to tailor metrics specifically to your organization’s funding priorities. This flexibility enables you to present clear, relevant data to stakeholders while maintaining a transparent audit trail for compliance.

Automated Data Extraction: As an enterprise-grade solution, Grant Guardian automates the extraction and analysis of data from financial reports, identifies potential risks, standardizes assessments, and minimizes user error from bias. This standardization is crucial, as nonprofits often vary in the financial documents they provide, making the due diligence process more complex and error-prone for funders…(More)”.

From social media to artificial intelligence: improving research on digital harms in youth


Article by Karen Mansfield, Sakshi Ghai, Thomas Hakman, Nick Ballou, Matti Vuorre, and Andrew K Przybylski: “…we critically evaluate the limitations and underlying challenges of existing research into the negative mental health consequences of internet-mediated technologies on young people. We argue that identifying and proactively addressing consistent shortcomings is the most effective method for building an accurate evidence base for the forthcoming influx of research on the effects of artificial intelligence (AI) on children and adolescents. Basic research, advice for caregivers, and evidence for policy makers should tackle the challenges that led to the misunderstanding of social media harms. The Personal View has four sections: first, we conducted a critical appraisal of recent reviews regarding effects of technology on children and adolescents’ mental health, aimed at identifying limitations in the evidence base; second, we discuss what we think are the most pressing methodological challenges underlying those limitations; third, we propose effective ways to address these limitations, building on robust methodology, with reference to emerging applications in the study of AI and children and adolescents’ wellbeing; and lastly, we articulate steps for conceptualising and rigorously studying the ever-shifting sociotechnological landscape of digital childhood and adolescence. We outline how the most effective approach to understanding how young people shape, and are shaped by, emerging technologies, is by identifying and directly addressing specific challenges. We present an approach grounded in interpreting findings through a coherent and collaborative evidence-based framework in a measured, incremental, and informative way…(More)”

Big data for decision-making in public transport management: A comparison of different data sources


Paper by Valeria Maria Urbano, Marika Arena, and Giovanni Azzone: “The conventional data used to support public transport management have inherent constraints related to scalability, cost, and the potential to capture space and time variability. These limitations underscore the importance of exploring innovative data sources to complement more traditional ones.

For public transport operators, who are tasked with making pivotal decisions spanning planning, operation, and performance measurement, innovative data sources are a frontier that is still largely unexplored. To fill this gap, this study first establishes a framework for evaluating innovative data sources, highlighting the specific characteristics that data should have to support decision-making in the context of transportation management. Second, a comparative analysis is conducted, using empirical data collected from primary public transport operators in the Lombardy region, with the aim of understanding whether and to what extent different data sources meet the above requirements.

The findings of this study support transport operators in selecting data sources aligned with different decision-making domains, highlighting related benefits and challenges. This underscores the importance of integrating different data sources to exploit their complementarities…(More)”.

Developing a Framework for Collective Data Rights


Report by Jeni Tennison: “Are collective data rights really necessary? Or, do people and communities already have sufficient rights to address harms through equality, public administration or consumer law? Might collective data rights even be harmful by undermining individual data rights or creating unjust collectivities? If we did have collective data rights, what should they look like? And how could they be introduced into legislation?

Data protection law and policy are founded on the notion of individual notice and consent, originating from the handling of personal data gathered for medical and scientific research. However, recent work on data governance has highlighted shortcomings with the notice-and-consent approach, especially in an age of big data and artificial intelligence. This special reports considers the need for collective data rights by examining legal remedies currently available in the United Kingdom in three scenarios where the people affected by algorithmic decision making are not data subjects and therefore do not have individual data protection rights…(More)”.

The Case for Local and Regional Public Engagement in Governing Artificial Intelligence


Article by Stefaan Verhulst and Claudia Chwalisz: “As the Paris AI Action Summit approaches, the world’s attention will once again turn to the urgent questions surrounding how we govern artificial intelligence responsibly. Discussions will inevitably include calls for global coordination and participation, exemplified by several proposals for a Global Citizens’ Assembly on AI. While such initiatives aim to foster inclusivity, the reality is that meaningful deliberation and actionable outcomes often emerge most effectively at the local and regional levels.

Building on earlier reflections in “AI Globalism and AI Localism,” we argue that to govern AI for public benefit, we must prioritize building public engagement capacity closer to the communities where AI systems are deployed. Localized engagement not only ensures relevance to specific cultural, social, and economic contexts but also equips communities with the agency to shape both policy and product development in ways that reflect their needs and values.

While a Global Citizens’ Assembly sounds like a great idea on the surface, there is no public authority with teeth or enforcement mechanisms at that level of governance. The Paris Summit represents an opportunity to rethink existing AI governance frameworks, reorienting them toward an approach that is grounded in lived, local realities and mutually respectful processes of co-creation. Toward that end, we elaborate below on proposals for: local and regional AI assemblies; AI citizens’ assemblies for EU policy; capacity-building programs, and localized data governance models…(More)”.

Reimagining data for Open Source AI: A call to action


Report by Open Source Initiative: “Artificial intelligence (AI) is changing the world at a remarkable pace, with Open Source AI playing a pivotal role in shaping its trajectory. Yet, as AI advances, a fundamental challenge emerges: How do we create a data ecosystem that is not only robust but also equitable and sustainable?

The Open Source Initiative (OSI) and Open Future have taken a significant step toward addressing this challenge by releasing a white paper: “Data Governance in Open Source AI: Enabling Responsible and Systematic Access.” This document is the culmination of a global co-design process, enriched by insights from a vibrant two-day workshop held in Paris in October 2024….

The white paper offers a blueprint for a data ecosystem rooted in fairness, inclusivity and sustainability. It calls for two transformative shifts:

  1. From Open Data to Data Commons: Moving beyond the notion of unrestricted data to a model that balances openness with the rights and needs of all stakeholders.
  2. Broadening the stakeholder universe: Creating collaborative frameworks that unite communities, stewards and creators in equitable data-sharing practices.

To bring these shifts to life, the white paper delves into six critical focus areas:

  • Data preparation
  • Preference signaling and licensing
  • Data stewards and custodians
  • Environmental sustainability
  • Reciprocity and compensation
  • Policy interventions…(More)”