Article by Gwen Ottinger: “…Data Collaboratives would move public participation and community engagement upstream in the policy process by creating opportunities for community members to contribute their lived experience to the assessment of data and the framing of policy problems. This would in turn foster two-way communication and trusting relationships between government and the public. Data Collaboratives would also help ensure that data and their uses in federal government are equitable, by inviting a broader range of perspectives on how data analysis can promote equity and where relevant data are missing. Finally, Data Collaboratives would be one vehicle for enabling individuals to participate in science, technology, engineering, math, and medicine activities throughout their lives, increasing the quality of American science and the competitiveness of American industry…(More)”.
Local Government: Artificial intelligence use cases
Repository by the (UK) Local Government Association: “Building on the findings of our recent AI survey, which highlighted the need for practical examples, this bank showcases the diverse ways local authorities are leveraging AI.
Within this collection, you’ll discover a spectrum of AI adoption, ranging from utilising AI assistants to streamline back-office tasks to pioneering the implementation of bespoke Large Language Models (LLMs). These real-world use cases exemplify the innovative spirit driving advancements in local government service delivery.
Whether your council is at the outset of its AI exploration or seeking to expand its existing capabilities, this bank offers a wealth of valuable insights and best practices to support your organisation’s AI journey…(More)”.
Developing a public-interest training commons of books
Article by Authors Alliance: “…is pleased to announce a new project, supported by the Mellon Foundation, to develop an actionable plan for a public-interest book training commons for artificial intelligence. Northeastern University Library will be supporting this project and helping to coordinate its progress.
Access to books will play an essential role in how artificial intelligence develops. AI’s Large Language Models (LLMs) have a voracious appetite for text, and there are good reasons to think that these data sets should include books and lots of them. Over the last 500 years, human authors have written over 129 million books. These volumes, preserved for future generations in some of our most treasured research libraries, are perhaps the best and most sophisticated reflection of all human thinking. Their high editorial quality, breadth, and diversity of content, as well as the unique way they employ long-form narratives to communicate sophisticated and nuanced arguments and ideas make them ideal training data sources for AI.
These collections and the text embedded in them should be made available under ethical and fair rules as the raw material that will enable the computationally intense analysis needed to inform new AI models, algorithms, and applications imagined by a wide range of organizations and individuals for the benefit of humanity…(More)”
Experts warn about the ‘crumbling infrastructure’ of federal government data
Article by Hansi Lo Wang: “The stability of the federal government’s system for producing statistics, which the U.S. relies on to understand its population and economy, is under threat because of budget concerns, officials and data users warn.
And that’s before any follow-through on the new Trump administration and Republican lawmakers‘ pledges to slash government spending, which could further affect data production.
In recent months, budget shortfalls and the restrictions of short-term funding have led to the end of some datasets by the Bureau of Economic Analysis, known for its tracking of the gross domestic product, and to proposals by the Bureau of Labor Statistics to reduce the number of participants surveyed to produce the monthly jobs report. A “lack of multiyear funding” has also hurt efforts to modernize the software and other technology the BLS needs to put out its data properly, concluded a report by an expert panel tasked with examining multiple botched data releases last year.
Long-term funding questions are also dogging the Census Bureau, which carries out many of the federal government’s surveys and is preparing for the 2030 head count that’s set to be used to redistribute political representation and trillions in public funding across the country. Some census watchers are concerned budget issues may force the bureau to cancel some of its field tests for the upcoming tally, as it did with 2020 census tests for improving the counts in Spanish-speaking communities, rural areas and on Indigenous reservations.
While the statistical agencies have not been named specifically, some advocates are worried that calls to reduce the federal government’s workforce by President Trump and the new Republican-controlled Congress could put the integrity of the country’s data at greater risk…(More)”
Data Stewardship as Environmental Stewardship
Article by Stefaan Verhulst and Sara Marcucci: “Why responsible data stewardship could help address today’s pressing environmental challenges resulting from artificial intelligence and other data-related technologies…
Even as the world grows increasingly reliant on data and artificial intelligence, concern over the environmental impact of data-related activities is increasing. Solutions remain elusive. The rise of generative AI, which rests on a foundation of massive data sets and computational power, risks exacerbating the problem.
In the below, we propose that responsible data stewardship offers a potential pathway to reducing the environmental footprint of data activities. By promoting practices such as data reuse, minimizing digital waste, and optimizing storage efficiency, data stewardship can help mitigate environmental harm. Additionally, data stewardship supports broader environmental objectives by facilitating better decision-making through transparent, accessible, and shared data. In the below, we suggest that advancing data stewardship as a cornerstone of environmental responsibility could provide a compelling approach to addressing the dual challenges of advancing digital technologies while safeguarding the environment…(More)”

Data Governance Meets the EU AI Act
Article by Axel Schwanke: “..The EU AI Act emphasizes sustainable AI through robust data governance, promoting principles like data minimization, purpose limitation, and data quality to ensure responsible data collection and processing. It mandates measures such as data protection impact assessments and retention policies. Article 10 underscores the importance of effective data management in fostering ethical and sustainable AI development…This article states that high-risk AI systems must be developed using high-quality data sets for training, validation, and testing. These data sets should be managed properly, considering factors like data collection processes, data preparation, potential biases, and data gaps. The data sets should be relevant, representative, error-free, and complete as much as possible. They should also consider the specific context in which the AI system will be used. In some cases, providers may process special categories of personal data to detect and correct biases, but they must follow strict conditions to protect individuals’ rights and freedoms…
However, achieving compliance presents several significant challenges:
- Ensuring Dataset Quality and Relevance: Organizations must establish robust data and AI platforms to prepare and manage datasets that are error-free, representative, and contextually relevant for their intended use cases. This requires rigorous data preparation and validation processes.
- Bias and Contextual Sensitivity: Continuous monitoring for biases in data is critical. Organizations must implement corrective actions to address gaps while ensuring compliance with privacy regulations, especially when processing personal data to detect and reduce bias.
- End-to-End Traceability: A comprehensive data governance framework is essential to track and document data flow from its origin to its final use in AI models. This ensures transparency, accountability, and compliance with regulatory requirements.
- Evolving Data Requirements: Dynamic applications and changing schemas, particularly in industries like real estate, necessitate ongoing updates to data preparation processes to maintain relevance and accuracy.
- Secure Data Processing: Compliance demands strict adherence to secure processing practices for personal data, ensuring privacy and security while enabling bias detection and mitigation.
Example: Real Estate Data
Immowelt’s real estate price map, awarded as the top performer in a 2022 test of real estate price maps, exemplifies the challenges of achieving high-quality datasets. The prepared data powers numerous services and applications, including data analysis, price predictions, personalization, recommendations, and market research…(More)”
Building Safer and Interoperable AI Systems
Essay by Vint Cerf: “While I am no expert on artificial intelligence (AI), I have some experience with the concept of agents. Thirty-five years ago, my colleague, Robert Kahn, and I explored the idea of knowledge robots (“knowbots” for short) in the context of digital libraries. In principle, a knowbot was a mobile piece of code that could move around the Internet, landing at servers, where they could execute tasks on behalf of users. The concept is mostly related to finding information and processing it on behalf of a user. We imagined that the knowbot code would land at a serving “knowbot hotel” where it would be given access to content and computing capability. The knowbots would be able to clone themselves to execute their objectives in parallel and would return to their origins bearing the results of their work. Modest prototypes were built in the pre-Web era.
In today’s world, artificially intelligent agents are now contemplated that can interact with each other and with information sources found on the Internet. For this to work, it’s my conjecture that a syntax and semantics will need to be developed and perhaps standardized to facilitate inter-agent interaction, agreements, and commitments for work to be performed, as well as a means for conveying results in reliable and unambiguous ways. A primary question for all such concepts starts with “What could possibly go wrong?”
In the context of AI applications and agents, work is underway to answer that question. I recently found one answer to that in the MLCommons AI Safety Working Group and its tool, AILuminate. My coarse sense of this is that AILuminate posts a large and widely varying collection of prompts—not unlike the notion of testing software by fuzzing—looking for inappropriate responses. Large language models (LLMs) can be tested and graded (that’s the hard part) on responses to a wide range of prompts. Some kind of overall safety metric might be established to connect one LLM to another. One might imagine query collections oriented toward exposing particular contextual weaknesses in LLMs. If these ideas prove useful, one could even imagine using them in testing services such as those at Underwriters Laboratory, now called UL Solutions. UL Solutions already offers software testing among its many other services.
LLMs as agents seem naturally attractive…(More)”.
Why Digital Public Goods, including AI, Should Depend on Open Data
Article by Cable Green: “Acknowledging that some data should not be shared (for moral, ethical and/or privacy reasons) and some cannot be shared (for legal or other reasons), Creative Commons (CC) thinks there is value in incentivizing the creation, sharing, and use of open data to advance knowledge production. As open communities continue to imagine, design, and build digital public goods and public infrastructure services for education, science, and culture, these goods and services – whenever possible and appropriate – should produce, share, and/or build upon open data.
Open Data and Digital Public Goods (DPGs)
CC is a member of the Digital Public Goods Alliance (DPGA) and CC’s legal tools have been recognized as digital public goods (DPGs). DPGs are “open-source software, open standards, open data, open AI systems, and open content collections that adhere to privacy and other applicable best practices, do no harm, and are of high relevance for attainment of the United Nations 2030 Sustainable Development Goals (SDGs).” If we want to solve the world’s greatest challenges, governments and other funders will need to invest in, develop, openly license, share, and use DPGs.
Open data is important to DPGs because data is a key driver of economic vitality with demonstrated potential to serve the public good. In the public sector, data informs policy making and public services delivery by helping to channel scarce resources to those most in need; providing the means to hold governments accountable and foster social innovation. In short, data has the potential to improve people’s lives. When data is closed or otherwise unavailable, the public does not accrue these benefits.CC was recently part of a DPGA sub-committee working to preserve the integrity of open data as part of the DPG Standard. This important update to the DPG Standard was introduced to ensure only open datasets and content collections with open licenses are eligible for recognition as DPGs. This new requirement means open data sets and content collections must meet the following criteria to be recognised as a digital public good.
- Comprehensive Open Licensing:
- The entire data set/content collection must be under an acceptable open licence. Mixed-licensed collections will no longer be accepted.
- Accessible and Discoverable:
- All data sets and content collection DPGs must be openly licensed and easily accessible from a distinct, single location, such as a unique URL.
- Permitted Access Restrictions:
- Certain access restrictions – such as logins, registrations, API keys, and throttling – are permitted as long as they do not discriminate against users or restrict usage based on geography or any other factors…(More)”.
Mindmasters: The Data-Driven Science of Predicting and Changing Human Behavior
Book by Sandra Matz: “There are more pieces of digital data than there are stars in the universe. This data helps us monitor our planet, decipher our genetic code, and take a deep dive into our psychology.
As algorithms become increasingly adept at accessing the human mind, they also become more and more powerful at controlling it, enticing us to buy a certain product or vote for a certain political candidate. Some of us say this technological trend is no big deal. Others consider it one of the greatest threats to humanity. But what if the truth is more nuanced and mind-bending than that?
In Mindmasters, Columbia Business School professor Sandra Matz reveals in fascinating detail how big data offers insights into the most intimate aspects of our psyches and how these insights empower an external influence over the choices we make. This can be creepy, manipulative, and downright harmful, with scandals like that of British consulting firm Cambridge Analytica being merely the tip of the iceberg. Yet big data also holds enormous potential to help us live healthier, happier lives—for example, by improving our mental health, encouraging better financial decisions, or enabling us to break out of our echo chambers..(More)”.
Silencing Science Tracker
About: “The Silencing Science Tracker is a joint initiative of the Sabin Center for Climate Change Law and the Climate Science Legal Defense Fund. It is intended to record reports of federal, state, and local government attempts to “silence science” since the November 2016 election.
We define “silencing science” to include any action that has the effect of restricting or prohibiting scientific research, education, or discussion, or the publication or use of scientific information. We divide such actions into 7 categories as follows…(More)”
Category | Examples | |
---|---|---|
Government Censorship | Changing the content of websites and documents to suppress or distort scientific information.Making scientific data more difficult to find or access.Restricting public communication by scientists. | |
Self-Censorship | Scientists voluntarily changing the content of websites and documents to suppress or distort scientific information, potentially in response to political pressure. We note that it is often difficult to determine whether self-censorship is occurring and/or its cause. We do not take any position on the accuracy of any individual report on self-censorship. | |
Budget Cuts | Reducing funding for existing agency programs involving scientific research or scientific education.Cancelling existing grants for scientific research or scientific education. We do not include, in the “budget cuts” category, government decisions to refuse new grant applications or funding for new agency programs. | |
Personnel Changes | Removing scientists from agency positions or creating a hostile work environment.Appointing unqualified individuals to, or failing to fill, scientific positions.Changing the composition of scientific advisory board or other bodies to remove qualified scientists or add only industry-favored members.Eliminating government bodies involved in scientific research or education or the dissemination of scientific information. | |
Research Hindrance | Destroying data needed to undertake scientific research.Preventing or restricting the publication of scientific research.Pressuring scientists to change research findings. | |
Bias and Misrepresentation | Engaging in “cherry picking” or only disclosing certain scientific studies (e.g., that support a particular conclusion).Misrepresenting or mischaracterizing scientific studies.Disregarding scientific studies or advice in policy-making. | |
Interference with Education | Changing science education standards to prevent or limit the teaching of proven scientific theories.Requiring or encouraging the teaching of discredited or unproven scientific theories.Preventing the use of factually accurate textbooks and other instructional materials (e.g., on religious grounds). |