These Startups Are Building Advanced AI Models Without Data Centers


Article by Will Knight: “Researchers have trained a new kind of large language model (LLM) using GPUs dotted across the world and fed private as well as public data—a move that suggests that the dominant way of building artificial intelligence could be disrupted.

Article by Will Knight: “Flower AI and Vana, two startups pursuing unconventional approaches to building AI, worked together to create the new model, called Collective-1.

Flower created techniques that allow training to be spread across hundreds of computers connected over the internet. The company’s technology is already used by some firms to train AI models without needing to pool compute resources or data. Vana provided sources of data including private messages from X, Reddit, and Telegram.

Collective-1 is small by modern standards, with 7 billion parameters—values that combine to give the model its abilities—compared to hundreds of billions for today’s most advanced models, such as those that power programs like ChatGPTClaude, and Gemini.

Nic Lane, a computer scientist at the University of Cambridge and cofounder of Flower AI, says that the distributed approach promises to scale far beyond the size of Collective-1. Lane adds that Flower AI is partway through training a model with 30 billion parameters using conventional data, and plans to train another model with 100 billion parameters—close to the size offered by industry leaders—later this year. “It could really change the way everyone thinks about AI, so we’re chasing this pretty hard,” Lane says. He says the startup is also incorporating images and audio into training to create multimodal models.

Distributed model-building could also unsettle the power dynamics that have shaped the AI industry…(More)”

AI action plan database


A project by the Institute for Progress: “In January 2025, President Trump tasked the Office of Science and Technology Policy with creating an AI Action Plan to promote American AI Leadership. The government requested input from the public, and received 10,068 submissions. The database below summarizes specific recommendations from these submissions. … We used AI to extract recommendations from each submission, and to tag them with relevant information. Click on a recommendation to learn more about it. See our analysis of common themes and ideas across these recommendations…(More)”.

Updating purpose limitation for AI: a normative approach from law and philosophy 


Paper by Rainer Mühlhoff and Hannah Ruschemeier: “The purpose limitation principle goes beyond the protection of the individual data subjects: it aims to ensure transparency, fairness and its exception for privileged purposes. However, in the current reality of powerful AI models, purpose limitation is often impossible to enforce and is thus structurally undermined. This paper addresses a critical regulatory gap in EU digital legislation: the risk of secondary use of trained models and anonymised training datasets. Anonymised training data, as well as AI models trained from this data, pose the threat of being freely reused in potentially harmful contexts such as insurance risk scoring and automated job applicant screening. We propose shifting the focus of purpose limitation from data processing to AI model regulation. This approach mandates that those training AI models define the intended purpose and restrict the use of the model solely to this stated purpose…(More)”.

Rebooting the global consensus: Norm entrepreneurship, data governance and the inalienability of digital bodies


Paper by Siddharth Peter de Souza and Linnet Taylor: “The establishment of norms among states is a common way of governing international actions. This article analyses the potential of norm-building for governing data and artificial intelligence technologies’ collective effects. Rather than focusing on state actors’s ability to establish and enforce norms, however, we identify a contrasting process taking place among civil society organisations in response to the international neoliberal consensus on the commodification of data. The norm we identify – ‘nothing about us without us’ – asserts civil society’s agency, and specifically the right of those represented in datasets to give or refuse permission through structures of democratic representation. We argue that this represents a form of norm-building that should be taken as seriously as that of states, and analyse how it is constructing the political power, relations, and resources to engage in governing technology at scale. We first outline how this counter-norming is anchored in data’s connections to bodies, land, community, and labour. We explore the history of formal international norm-making and the current norm-making work being done by civil society organisations internationally, and argue that these, although very different in their configurations and strategies, are comparable in scale and scope. Based on this, we make two assertions: first, that a norm-making lens is a useful way for both civil society and research to frame challenges to the primacy of market logics in law and governance, and second, that the conceptual exclusion of civil society actors as norm-makers is an obstacle to the recognition of counter-power in those spheres…(More)”.

Technical Tiers: A New Classification Framework for Global AI Workforce Analysis


Report by Siddhi Pal, Catherine Schneider and Ruggero Marino Lazzaroni: “… introduces a novel three-tiered classification system for global AI talent that addresses significant methodological limitations in existing workforce analyses, by distinguishing between different skill categories within the existing AI talent pool. By distinguishing between non-technical roles (Category 0), technical software development (Category 1), and advanced deep learning specialization (Category 2), our framework enables precise examination of AI workforce dynamics at a pivotal moment in global AI policy.

Through our analysis of a sample of 1.6 million individuals in the AI talent pool across 31 countries, we’ve uncovered clear patterns in technical talent distribution that significantly impact Europe’s AI ambitions. Asian nations hold an advantage in specialized AI expertise, with South Korea (27%), Israel (23%), and Japan (20%) maintaining the highest proportions of Category 2 talent. Within Europe, Poland and Germany stand out as leaders in specialized AI talent. This may be connected to their initiatives to attract tech companies and investments in elite research institutions, though further research is needed to confirm these relationships.

Our data also reveals a shifting landscape of global talent flows. Research shows that countries employing points-based immigration systems attract 1.5 times more high-skilled migrants than those using demand-led approaches. This finding takes on new significance in light of recent geopolitical developments affecting scientific research globally. As restrictive policies and funding cuts create uncertainty for researchers in the United States, one of the big destinations for European AI talent, the way nations position their regulatory environments, scientific freedoms, and research infrastructure will increasingly determine their ability to attract and retain specialized AI talent.

The gender analysis in our study illuminates another dimension of competitive advantage. Contrary to the overall AI talent pool, EU countries lead in female representation in highly technical roles (Category 2), occupying seven of the top ten global rankings. Finland, Czechia, and Italy have the highest proportion of female representation in Category 2 roles globally (39%, 31%, and 28%, respectively). This gender diversity represents not merely a social achievement but a potential strategic asset in AI innovation, particularly as global coalitions increasingly emphasize the importance of diverse perspectives in AI development…(More)”

Integrating Data Governance and Mental Health Equity: Insights from ‘Towards a Set of Universal Data Principles’


Article by Cindy Hansen: “This recent scholarly work, “Towards a Set of Universal Data Principles” by Steve MacFeely et al (2025), delves comprehensively into the expansive landscape of data management and governance. It is noteworthy to acknowledge the intricate processes through which humans collect, manage, and disseminate vast quantities of data. …To truly democratize digital mental healthcare, it’s crucial to empower individuals in their data journey. By focusing on Digital Self-Determination, people can participate in a transformative shift where control over personal data becomes a fundamental right, aligning with the proposed universal data principles. One can envision a world where mental health data, collected and used responsibly, contributes not only to personal well-being but also to the greater public good, echoing the need for data governance to serve society at large.

This concept of digital self-determination empowers individuals by ensuring they have the autonomy to decide who accesses their mental health data and how it’s utilized. Such empowerment is especially significant in the context of mental health, where data sensitivity is high, and privacy is paramount. Giving people the confidence to manage their data fosters trust and encourages them to engage more openly with digital health services, promoting a culture of trust which is a core element of the proposed data governance frameworks.

Holistic Research Canada’s Outcome Monitoring System honors this ethos, allowing individuals to control how their data is accessed, shared, and used while maintaining engagement with healthcare providers. With this system, people can actively participate in their mental health decisions, supported by data that offers transparency about their progress and prognoses, which is crucial in realizing the potential of data to serve both individual and broader societal interests.

Furthermore, this tool provides actionable insights into mental health journeys, promoting evidence-based practices, enhancing transparency, and ensuring that individuals’ rights are safeguarded throughout. These principles are vital to transforming individuals from passive subjects into active stewards of their data, consistent with the proposed principles of safeguarding data quality, integrity, and security…(More)”.

Make privacy policies longer and appoint LLM readers


Paper by Przemysław Pałka et al: “In a world of human-only readers, a trade-off persists between comprehensiveness and comprehensibility: only privacy policies too long to be humanly readable can precisely describe the intended data processing. We argue that this trade-off no longer exists where LLMs are able to extract tailored information from clearly-drafted fully-comprehensive privacy policies. To substantiate this claim, we provide a methodology for drafting comprehensive non-ambiguous privacy policies and for querying them using LLMs prompts. Our methodology is tested with an experiment aimed at determining to what extent GPT-4 and Llama2 are able to answer questions regarding the content of privacy policies designed in the format we propose. We further support this claim by analyzing real privacy policies in the chosen market sectors through two experiments (one with legal experts, and another by using LLMs). Based on the success of our experiments, we submit that data protection law should change: it must require controllers to provide clearly drafted, fully comprehensive privacy policies from which data subjects and other actors can extract the needed information, with the help of LLMs…(More)”.

Artificial Intelligence: Generative AI’s Environmental and Human Effects


GAO Report: “Generative artificial intelligence (AI) could revolutionize entire industries. In the nearer term, it may dramatically increase productivity and transform daily tasks in many sectors. However, both its benefits and risks, including its environmental and human effects, are unknown or unclear.

Generative AI uses significant energy and water resources, but companies are generally not reporting details of these uses. Most estimates of environmental effects of generative AI technologies have focused on quantifying the energy consumed, and carbon emissions associated with generating that energy, required to train the generative AI model. Estimates of water consumption by generative AI are limited. Generative AI is expected to be a driving force for data center demand, but what portion of data center electricity consumption is related to generative AI is unclear. According to the International Energy Agency, U.S. data center electricity consumption was approximately 4 percent of U.S. electricity demand in 2022 and could be 6 percent of demand in 2026.

While generative AI may bring beneficial effects for people, GAO highlights five risks and challenges that could result in negative human effects on society, culture, and people from generative AI (see figure). For example, unsafe systems may produce outputs that compromise safety, such as inaccurate information, undesirable content, or the enabling of malicious behavior. However, definitive statements about these risks and challenges are difficult to make because generative AI is rapidly evolving, and private developers do not disclose some key technical information.

Selected generative artificial antelligence risks and challenges that could result in human effects

GAO identified policy options to consider that could enhance the benefits or address the challenges of environmental and human effects of generative AI. These policy options identify possible actions by policymakers, which include Congress, federal agencies, state and local governments, academic and research institutions, and industry. In addition, policymakers could choose to maintain the status quo, whereby they would not take additional action beyond current efforts. See below for details on the policy options…(More)”.

Inquiry as Infrastructure: Defining Good Questions in the Age of Data and AI


Paper by Stefaan Verhulst: “The most consequential failures in data-driven policymaking and AI deployment often stem not from poor models or inadequate datasets but from poorly framed questions. This paper centers question literacy as a critical yet underdeveloped competency in the data and policy landscape. Arguing for a “new science of questions,” it explores what constitutes a good question-one that is not only technically feasible but also ethically grounded, socially legitimate, and aligned with real-world needs. Drawing on insights from The GovLab’s 100 Questions Initiative, the paper develops a taxonomy of question types-descriptive, diagnostic, predictive, and prescriptive-and identifies five essential criteria for question quality: questions must be general yet concrete, co-designed with affected communities and domain experts, purpose-driven and ethically sound, grounded in data and technical realities, and capable of evolving through iterative refinement. The paper also outlines common pathologies of bad questions, such as vague formulation, biased framing, and solution-first thinking. Rather than treating questions as incidental to analysis, it argues for institutionalizing deliberate question design through tools like Q-Labs, question maturity models, and new professional roles for data stewards. Ultimately, the paper contends that the questions are infrastructures of meaning. What we ask shapes not only what data we collect or what models we build but also what values we uphold and what futures we make possible…(More)”.

The Overlooked Importance of Data Reuse in AI Infrastructure


Essay by Oxford Insights and The Data Tank: “Employing data stewards and embedding responsible data reuse principles in the programme or ecosystem and within participating organisations is one of the pathways forward. Data stewards are proactive agents responsible for catalysing collaboration, tackling these challenges and embedding data reuse practices in their organisations. 

The role of Chief Data Officer for government agencies has become more common in recent years and we suggest the same needs to happen with the role of the Chief Data Steward. Chief Data Officers are mostly focused on internal data management and have a technical focus. With the changes in the data governance landscape, this profession needs to be reimagined and iterated. Embedded in both the demand and the supply sides of data, data stewards are proactive agents empowered to create public value by re-using data and data expertise. They are tasked to identify opportunities for productive cross-sectoral collaboration, and proactively request or enable functional access to data, insights, and expertise. 

One exception comes from New Zealand. The UN has released a report on the role of data stewards and National Statistical Offices (NSOs) in the new data ecosystem. This report provides many use-cases that can be adopted by governments seeking to establish such a role. In New Zealand, there is an appointed Government Chief Data Steward, who is in charge of setting the strategic direction for government’s data management, and focuses on data reuse altogether. 

Data stewards can play an important role in organisations leading data reuse programmes. Data stewards would be responsible for responding to the challenges with participation introduced above. 

A Data Steward’s role includes attracting participation for data reuse programmes by:

  • Demonstrating and communicating the value proposition of data reuse and collaborations, by engaging in partnerships and steering data reuse and sharing among data commons, cooperatives, or collaborative infrastructures. 
  • Developing responsible data lifecycle governance, and communicating insights to raise awareness and build trust among stakeholders; 

A Data Steward’s role includes maintaining and scaling participation for data reuse programmes by:

  • Maintaining trust by engaging with wider stakeholders and establishing clear engagement methodologies. For example, by embedding a social license, data stewards assure the digital self determination principle is embedded in data reuse processes. 
  • Fostering sustainable partnerships and collaborations around data, via developing business cases for data sharing and reuse, and measuring impact to build the societal case for data collaboration; and
  • Innovating in the sector by turning data to decision intelligence to ensure that insights derived from data are more effectively integrated into decision-making processes…(More)”.