Stefaan Verhulst
Article by Matthew Gault: “Researchers working with data from the Internet Archive have discovered that a third of websites created since 2022 are AI-generated. The team of researchers—which includes people from Stanford, the Imperial College London, and the Internet Archive—published their findings online in a paper titled “The Impact of AI-Generated Text on the Internet.” The research also found that all this AI-generated text is making the web more cheery and less verbose. Inspired by the Dead Internet Theory—the idea that much of the internet is now just bots talking back and forth—the team set out to find out how ChatGPT and its competitors had reshaped the internet since 2022. “The proliferation of AI-generated and AI-assisted text on the internet is feared to contribute to a degradation in semantic and stylistic diversity, factual accuracy, and other negative developments,” the researchers write in the paper. “We find that by mid-2025, roughly 35% of newly published websites were classified as AI-generated or AI-assisted, up from zero before ChatGPT’s launch in late 2022…(More)”.
Article by Northwestern Innovation Institute: “Federal agencies such as the NIH and the National Science Foundation help determine which scientific ideas receive public support, which researchers are able to pursue ambitious work and which fields gain momentum. Because those decisions shape the future direction of discovery, even subtle shifts in how proposals are written, evaluated and selected can have lasting effects across the research ecosystem.
Yet while large language models such as ChatGPT have rapidly entered classrooms, offices and laboratories, far less attention has been paid to how they may be influencing the grant process itself. Proposal writing is often one of the most time-consuming parts of academic life, and AI tools can reduce that burden by helping draft language, summarize prior work and improve organization.
To examine how those tools may already be affecting funding outcomes, researchers at Northwestern Innovation Institute analyzed confidential proposal submissions from two major U.S. research universities together with the full population of publicly released NIH and NSF awards from 2021 through 2025. The combined dataset —made possible in part through Bridge, a collaborative initiative at the Innovation Institute that integrates research, funding and innovation data across partner institutions — offered a rare window into both funded and unfunded proposals at the earliest stage of the research pipeline.

Signs of AI-assisted writing rose sharply beginning in 2023, shortly after generative AI tools became widely available. At NIH, proposals with higher levels of AI involvement were more likely to receive funding and went onto produce more publications. But that productivity gain came with an important qualifier: the additional output was concentrated in ordinary papers rather than the most highly cited work. AI-assisted grants produced more research, but not necessarily more breakthroughs.
Across both agencies, proposals with stronger AI signals also tended to be less distinctive from recently funded work. Crucially, the study found this reflects genuine shifts in what researchers are proposing, not merely how they are writing — when the researchers held scientific content constant and appliedAI rewriting to existing abstracts, the semantic position of those proposals barely changed. The convergence is happening at the level of ideas.
These findings directly address both open questions. The productivity gains — more publications, but not more breakthroughs, and only at NIH, suggest that AI is primarily lowering the cost of communication rather than accelerating scientific execution. And by observing confidential, unfunded proposals alongside funded awards, the study shows that AI’s influence is already operating upstream, reshaping how ideas are articulated and positioned before they ever reach publication…(More)”.
Paper by Kimitaka Asatani et al: “Intergovernmental organizations (IGOs) attempt to shape global policy through scientific guidelines and assessments. While they rely on external scientists to bridge research and IGO advisory processes, the structural pathways connecting science to IGO documents remain unexamined. By linking 230,737 scientific papers referenced in IGO documents (2015–2023) to their authors and coauthorship networks across 23 research fields, we identified a small cohort of “Highly IGO-Cited Scientists” (HIC-Sci)—typically comprising 0.7% to 4.4% of authors whose work accounts for 30% of IGO-cited papers. This structural concentration is associated with relational and cognitive patterns: dense transnational collaboration networks, overlapping memberships on advisory bodies such as the Intergovernmental Panel on Climate Change, rapid uptake in IGO documents, and standardized policy-oriented vocabularies. Geographically, HIC-Sci networks follow a core–periphery structure centered on Western Europe. Established fields, such as climate modeling, show stronger concentration, whereas emerging domains such as data science & AI show more distributed citation patterns. Major IGOs frequently cocite the same HIC-Sci papers, compounding this concentration through synchronized diffusion across IGOs. This concentration persists despite IGOs’ efforts to broaden participation and diversify their evidence base. While IGOs have developed criteria for selecting knowledge in advance, our framework provides a basis for subsequent assessment of how IGOs’ efforts to influence policy rely on a concentrated set of HIC-Sci…(More)”.
Article by James W Kelly: “Medical information of 500,000 participants of one of the UK’s landmark scientific programmes, UK Biobank, were offered for sale online in China, the government has confirmed.
Technology minister Ian Murray said information of all members of the database was found listed for sale on the website Alibaba.
Murray told MPs the charity which runs UK Biobank had told the government about the breach on Monday. He said the information did not include names, addresses, contact details or telephone numbers.
However he said it could include gender, age, month and year of birth, socioeconomic status, lifestyle habits, and measures from biological samples.
The Biobank is a collection of health data offered by volunteers which has been used to help improvements in detection and treatment of dementia, some cancers and Parkinson’s.
It has collected intimate details – including whole body scans, DNA sequences and their medical records – from hundreds of thousands of volunteers for over two decades. The project has led to more than 18,000 scientific publications.
Participants were aged from 40 to 69 when they were recruited between 2006 and 2010.
UK Biobank said it was investigating the incident and thanked the UK and Chinese governments, as well as Alibaba, for support and cooperation…(More)”.
Paper by Juan Ortiz-Freuler and Manuel Castells: “Control over digital interfaces has become a significant aspect of geopolitical struggles. This article advances an analytical framework illuminating how global communication power manifests across three key interfaces: search engines, social media, and AI agents. We articulate the evolution of these interfaces from corporate innovation to an aspect of contested transnational control, and conceptualize how corporate multinationals like Google, Facebook, TikTok, and DeepSeek leverage interface design to consolidate authority while state interventions challenge their market control. Governments seek to instrumentalize or challenge corporate interfaces to advance national goals, while firms strategically align with or resist state agendas to secure market access. The framework articulates how these forces reconfigure relations between information, people, and machines, with implications for the internet’s next phase…(More)”.
Article by Manon Revel & Théophile Pénigaud: “…unpacks the design choices behind longstanding and newly proposed computational frameworks aimed at finding common grounds across collective preferences and examines their potential future impacts, both technically and normatively. It begins by situating AI-assisted preference elicitation within the historical role of opinion polls, emphasizing that preferences are shaped by the decision-making context and are seldom objectively captured. With that caveat in mind, we explore the extent to which AI-based democratic innovations might serve as discovery tools which support reasonable representations of a collective will, sense-making, and agreement-seeking. At the same time, we caution against dangerously misguided uses, such as enabling binding decisions, fostering gradual disempowerment or post-rationalizing political outcomes…(More)”.
Paper by Jakob Ohme and LK Seiling: “The EU’s Digital Services Act (DSA) establishes, for the first time, a legal right for independent researchers to access platform data in the public interest. Once designated as Very Large Online Platforms or Search Engines (VLOPSEs), services reaching 45 million EU users must provide data access to support research on “systemic risks.” Article 40 creates two pathways: Article 40(12) enables access to publicly available data beyond voluntary platform tools, while Article 40(4) allows vetted researchers to request non-public data – such as exposure logs, moderation records, and recommendation metrics – through national Digital Services Coordinators rather than platforms. Both routes are purpose-limited to studying systemic risks and, for Article 40(4), mitigation measures. Yet the DSA’s broad, non-exhaustive definition of systemic risk – covering illegal content, fundamental rights, civic discourse, public health, and user well-being – opens a wide research space spanning misinformation flows, political networks, algorithmic amplification, and platform governance, among others. Early implementation reveals challenges, including uneven compliance, uncertain technical standards, funding constraints, and limits to data sharing for replication. Nonetheless, the DSA marks a turning point: platform research is no longer dependent on corporate discretion but grounded in public-interest regulation. Researchers now play a central role in shaping evidence-based oversight of digital platforms in Europe…(More)”.
Article by Alex Daniels: “Could the soul-sucking process of applying for philanthropic grants be on the way out? That is one of the goals of a new $8 million effort supported by the MacArthur Foundation.
The project, dubbed the Philanthropy Data Commons, is an attempt to bring a huge reservoir of foundation and charity information into a single database. Grant seekers and grant makers can drill into the data to find partners that share the same goals, among the vast universe of tax-exempt organizations.
“It should take nonprofits less time to apply for grants and allow them more time to spend on their missions,” said Elizabeth Kane, co-director of the Commons. “By the same token, many funders struggle to find and support organizations that are aligned with their goals. It could make the grant application process more efficient for both sides.”
Currently, the publicly available data from Internal Revenue Service filings that nonprofits can scour for grant information is limited. It only provides basic personnel and financial information and lacks detail about what work funders want to support and how well nonprofits have performed.
If enough organizations provide more granular information to the Data Commons — things like due diligence reports on potential grantees, project timelines, and impact data — the database and the applications created to use it could play matchmaker. Grantees and grant makers could be connected through a largely automated process. Grantees would be able to search grant makers and vice versa. Applications for many grants could be completed with a minimum of keystrokes. For instance, if a grantee located several foundations that matched some basic criteria, it could auto-populate fields in an application using its stored data and send it off to all of the grant makers at the same time…(More)”.
Paper by Alek Tarkowski: “Standard open licenses treat all users as formally equal. But when a researcher in Nairobi and a multinational technology company are offered the same terms of use for a language dataset, the result is not democratization but value extraction. This is the equity gap at the heart of the Paradox of Open. The Nwulite Obodo Open Data License (NOODL) directly responds to this challenge.
This report analyses the NOODL license, a tiered licensing framework developed for African language datasets, as an experiment in open data licensing and a contribution to emerging approaches to data commons governance. It is our second study that looks in detail in how components of a public AI stack can be created and governed (the first study concerned the development of AI models in Poland)
Developed in consultation with African language communities, NOODL builds on Creative Commons licensing but introduces a tiered framework of obligations based on users’ geographic and economic position. For users in the Global South, it applies permissive open terms. For users in high-income countries, it requires benefit or value sharing with the data community. Rather than treating all users identically, NOODL assumes that meaningful openness requires differentiation based on capacity and power.
This report examines NOODL as an experiment in open licensing with relevance beyond its immediate context: it points to the need to go beyond the binary of open vs closed. The analysis situates the license within the broader debate on democratizing AI, the growing ecosystem of commons-based data governance experiments, and Open Future’s own framework for commons-based data set governance. It also assesses the enforcement and adoption challenges NOODL faces, and considers what a healthier licensing ecosystem might look like: one that supports context-sensitive experimentation.
NOODL is currently applied to a single dataset. Its significance does not lie in scale, but in what it opens up: space to think beyond the “one size fits all” model that has defined open licensing for over two decades…(More)”
Article by Joel Gurin: “…A growing coalition of organizations, researchers, technologists, and civic leaders is working to save and preserve national data on many levels. Now it’s time to bring those lines of work together. We need a coordinated, national program to protect essential data and build alternatives where federal sources fail.
Such a program can begin by acknowledging that we cannot save everything. Data.gov, the federal portal for all the government’s public data, provides access to more than 400,000 datasets. Not all are equally important, equally used, or equally at risk. The challenge is to identify the most essential datasets—such as the ones that underpin public health, climate science, economic stability, education, and democratic accountability—and determine which are vulnerable.
A practical, scalable strategy can include several steps:
1. Track what we’ve lost. We need a thorough, AI-enabled scan of the federal data ecosystem to see what’s already been lost or changed, and set up automated monitoring to detect even subtle changes going forward.
2. Build coalitions in key domains. Public health experts know which datasets matter most to disease surveillance. Climate scientists know which environmental indicators are irreplaceable. Education researchers know which federal surveys track opportunity. These experts must work alongside data scientists, AI specialists, and philanthropic partners to map what truly counts.
3. Prioritize core datasets. Through interviews, surveys, and quantitative analysis—such as tracking citations in research or journalism—coalitions can identify a “core canon” of essential datasets in each field.
4. Assess the risks. Tools like the Data Checkup, developed by dataindex.us, can assess threats to federal datasets. This work can be automated and scaled with AI.
5. Determine the federal role. Some federal data—like satellite observations, national health surveillance, or economic indicators—cannot be replicated by states or private actors. Other data can be supplemented or replaced by state and local sources, private‑sector datasets, crowdsourcing, or nontraditional data sources.
6. Take action to save essential data. When federal data is essential, coalitions can pursue advocacy, public comments, direct engagement with agencies, or litigation. When alternatives exist, they can be developed, benchmarked, and scaled.
7. Put the data to work. The best way to defend data is to use it. Publishing use cases, visualizations, tools, and plain‑language insights helps the public see why this information matters. Generative AI can make federal and open data accessible to millions of non‑technical users.
8. Think globally. The threats to data go beyond the U.S. We need to track the international impacts of U.S. data loss, study how international sources might replace U.S. data, and share lessons learned with other countries.
9. Strengthen institutional protections. In addition to managing today’s immediate problems, we need to develop policies, laws, governance strategies, and guardrails for more stable, reliable data in the future.
10. Sustain the cycle. The threats will evolve. So must the response…(More)”.
