Policy paper by the German Council for Scientific Information Infrastructures: “…provides an overview and a comparative in-depth analysis of the emerging research (and research related) data infrastructures NFDI, EOSC, Gaia-X and the European Data Spaces. In addition, the Council makes recommendations for their future development and coordination. The RfII notes that access to genuine high-quality research data and related core services is a matter of basic public supply and strongly advises to achieve coherence between the various initiatives and approaches…(More)”.
Effective Data Stewardship in Higher Education: Skills, Competences, and the Emerging Role of Open Data Stewards
Paper by Panos Fitsilis et al: “The significance of open data in higher education stems from the changing tendencies towards open science, and open research in higher education encourages new ways of making scientific inquiry more transparent, collaborative and accessible. This study focuses on the critical role of open data stewards in this transition, essential for managing and disseminating research data effectively in universities, while it also highlights the increasing demand for structured training and professional policies for data stewards in academic settings. Building upon this context, the paper investigates the essential skills and competences required for effective data stewardship in higher education institutions by elaborating on a critical literature review, coupled with practical engagement in open data stewardship at universities, provided insights into the roles and responsibilities of data stewards. In response to these identified needs, the paper proposes a structured training framework and comprehensive curriculum for data stewardship, a direct response to the gaps identified in the literature. It addresses five key competence categories for open data stewards, aligning them with current trends and essential skills and knowledge in the field. By advocating for a structured approach to data stewardship education, this work sets the foundation for improved data management in universities and serves as a critical step towards professionalizing the role of data stewards in higher education. The emphasis on the role of open data stewards is expected to advance data accessibility and sharing practices, fostering increased transparency, collaboration, and innovation in academic research. This approach contributes to the evolution of universities into open ecosystems, where there is free flow of data for global education and research advancement…(More)”.
Commission launches public consultation on the rules for researchers to access online platform data under the Digital Services Act
Press Release: “Today, the Commission launched a public consultation on the draft delegated act on access to online platform data for vetted researchers under the Digital Services Act (DSA).

With the Digital Services Act, researchers will for the first time have access to data to study systemic risks and to assess online platforms’ risk mitigation measures in the EU. It will allow the research community to play a vital role in scrutinising and safeguarding the online environment.
The draft delegated act clarifies the procedures on how researchers can access Very Large Operating Platforms’ and Search Engines’ data. It also sets out rules on data formats and data documentation requirements. Lastly, it establishes the DSA data access portal, a one-stop-shop for researchers, data providers, and DSCs to exchange information on data access requests. The consultation follows a first call for evidence.
The consultation will run until 26 November 2024. After gathering public feedback, the Commission plans to adopt the rules in the first quarter of 2025…(More)”.
Science and technology’s contribution to the UK economy
UK House of Lords Primer: “It is difficult to accurately pinpoint the economic contribution of science and technology to the UK economy. This is because of the way sectors are divided up and reported in financial statistics.
For example, in September 2024 the Office for National Statistics (ONS) reported the following gross value added (GVA) figures by industry/sector for 2023:
- £71bn for IT and other information service activities
- £20.6bn for scientific research and development
This would amount to £91.6bn, forming approximately 3.9% of the total UK GVA of £2,368.7bn for 2023. However, a number of other sectors could also be included in these figures, for example:
- the manufacture of computer, certain machinery and electrical components (valued at £38bn in 2023)
- telecommunications (valued at £34.5bn)
If these two sectors were included too, GVA across all four sectors would total £164.1bn, approximately 6.9% of the UK’s 2023 GVA. However, this would likely still exclude relevant contributions that happen to fall within the definitions of different industries. For example, the manufacture of spacecraft and related machinery falls within the same sector as the manufacture of aircraft in the ONS’s data (this sector was valued at £10.8bn for 2023).
Alternatively, others have made estimates of the economic contribution of more specific sectors connected to science and technology. For example:
- Oxford Economics, an economic advisory firm, has estimated that, in 2023, the life sciences sector contributed over £13bn to the UK economy and employed one in every 121 employed people
- the government has estimated the value of the digital sector (comprising information technology and digital content and media) at £158.3bn for 2022
- a 2023 government report estimated the value of the UK’s artificial intelligence (AI) sector at around £3.7bn (in terms of GVA) and that the sector employed around 50,040 people
- the Energy and Climate Intelligence Unit, a non-profit organisation, reported estimates that the GVA of the UK’s net zero economy (encompassing sectors such as renewables, carbon capture, green and certain manufacturing) was £74bn in 2022/23 and that it supported approximately 765,700 full-time equivalent (FTE) jobs…(More)”.
Veridical Data Science
Book by Bin Yu and Rebecca L. Barter: “Most textbooks present data science as a linear analytic process involving a set of statistical and computational techniques without accounting for the challenges intrinsic to real-world applications. Veridical Data Science, by contrast, embraces the reality that most projects begin with an ambiguous domain question and messy data; it acknowledges that datasets are mere approximations of reality while analyses are mental constructs.
Bin Yu and Rebecca Barter employ the innovative Predictability, Computability, and Stability (PCS) framework to assess the trustworthiness and relevance of data-driven results relative to three sources of uncertainty that arise throughout the data science life cycle: the human decisions and judgment calls made during data collection, cleaning, and modeling. By providing real-world data case studies, intuitive explanations of common statistical and machine learning techniques, and supplementary R and Python code, Veridical Data Science offers a clear and actionable guide for conducting responsible data science. Requiring little background knowledge, this lucid, self-contained textbook provides a solid foundation and principled framework for future study of advanced methods in machine learning, statistics, and data science…(More)”.
Statistical Significance—and Why It Matters for Parenting
Blog by Emily Oster: “…When we say an effect is “statistically significant at the 5% level,” what this means is that there is less than a 5% chance that we’d see an effect of this size if the true effect were zero. (The “5% level” is a common cutoff, but things can be significant at the 1% or 10% level also.)
The natural follow-up question is: Why would any effect we see occur by chance? The answer lies in the fact that data is “noisy”: it comes with error. To see this a bit more, we can think about what would happen if we studied a setting where we know our true effect is zero.
My fake study
Imagine the following (fake) study. Participants are randomly assigned to eat a package of either blue or green M&Ms, and then they flip a (fair) coin and you see if it is heads. Your analysis will compare the number of heads that people flip after eating blue versus green M&Ms and report whether this is “statistically significant at the 5% level.”…(More)”.
External Researcher Access to Closed Foundation Models
Report by Esme Harrington and Dr. Mathias Vermeulen: “…addresses a pressing issue: independent researchers need better conditions for accessing and studying the AI models that big companies have developed. Foundation models — the core technology behind many AI applications — are controlled mainly by a few major players who decide who can study or use them.
What’s the problem with access?
- Limited access: Companies like OpenAI, Google and others are the gatekeepers. They often restrict access to researchers whose work aligns with their priorities, which means independent, public-interest research can be left out in the cold.
- High-end costs: Even when access is granted, it often comes with a hefty price tag that smaller or less-funded teams can’t afford.
- Lack of transparency: These companies don’t always share how their models are updated or moderated, making it nearly impossible for researchers to replicate studies or fully understand the technology.
- Legal risks: When researchers try to scrutinize these models, they sometimes face legal threats if their work uncovers flaws or vulnerabilities in the AI systems.
The research suggests that companies need to offer more affordable and transparent access to improve AI research. Additionally, governments should provide legal protections for researchers, especially when they are acting in the public interest by investigating potential risks…(More)”.
Key lesson of this year’s Nobel Prize: The importance of unlocking data responsibly to advance science and improve people’s lives
Article by Stefaan Verhulst, Anna Colom, and Marta Poblet: “This year’s Nobel Prize for Chemistry owes a lot to available, standardised, high quality data that can be reused to improve people’s lives. The winners, Prof David Baker from the University of Washington, and Demis Hassabis and John M. Jumper from Google DeepMind, were awarded respectively for the development and prediction of new proteins that can have important medical applications. These developments build on AI models that can predict protein structures in unprecedented ways. However, key to these models and their potential to unlock health discoveries is an open curated dataset with high quality and standardised data, something still rare despite the pace and scale of AI-driven development.
We live in a paradoxical time of both data abundance and data scarcity: a lot of data is being created and stored, but it tends to be inaccessible due to private interests and weak regulations. The challenge, then, is to prevent the misuse of data whilst avoiding its missed use.
The reuse of data remains limited in Europe, but a new set of regulations seeks to increase the possibilities of responsible data reuse. When the European Commission made the case for its European Data Strategy in 2020, it envisaged the European Union “a role model for a society empowered by data to make better decisions — in business and the public sector,” and acknowledged the need to improve “governance structures for handling data and to increase its pools of quality data available for use and reuse”…(More)”.
WikiProject AI Cleanup
Article by Emanuel Maiberg: “A group of Wikipedia editors have formed WikiProject AI Cleanup, “a collaboration to combat the increasing problem of unsourced, poorly-written AI-generated content on Wikipedia.”
The group’s goal is to protect one of the world’s largest repositories of information from the same kind of misleading AI-generated information that has plagued Google search results, books sold on Amazon, and academic journals.
“A few of us had noticed the prevalence of unnatural writing that showed clear signs of being AI-generated, and we managed to replicate similar ‘styles’ using ChatGPT,” Ilyas Lebleu, a founding member of WikiProject AI Cleanup, told me in an email. “Discovering some common AI catchphrases allowed us to quickly spot some of the most egregious examples of generated articles, which we quickly wanted to formalize into an organized project to compile our findings and techniques.”…(More)”.
Data’s Role in Unlocking Scientific Potential
Report by the Special Competitive Studies Project: “…we outline two actionable steps the U.S. government can take immediately to address the data sharing challenges hindering scientific research.
1. Create Comprehensive Data Inventories Across Scientific Domains
We recommend the Secretary of Commerce, acting through the Department of Commerce’s Chief Data Officer and the Director of the National Institute of Standards and Technology (NIST), and with the Federal Chief Data Officer Council (CDO Council) create a government-led inventory where organizations – universities, industries, and research institutes – can catalog their datasets with key details like purpose, description, and accreditation. Similar to platforms like data.gov, this centralized repository would make high-quality data more visible and accessible, promoting scientific collaboration. To boost participation, the government could offer incentives, such as grants or citation credits for researchers whose data is used. Contributing organizations would also be responsible for regularly updating their entries, ensuring the data stays relevant and searchable.
2. Create Scientific Data Sharing Public-Private Partnerships
A critical recommendation of the National Data Action Plan was for the United States to facilitate the creation of data sharing public-private partnerships for specific sectors. The U.S. Government should coordinate data sharing partnerships with its departments and agencies, industry, academia, and civil society. Data collected by one entity can be tremendously valuable to others. But incentivizing data sharing is challenging as privacy, security, legal (e.g., liability), and intellectual property (IP) concerns can limit willingness to share. However, narrowly-scoped PPPs can help overcome these barriers, allowing for greater data sharing and mutually beneficial data use…(More)”