Commission launches public consultation on the rules for researchers to access online platform data under the Digital Services Act


Press Release: “Today, the Commission launched a public consultation on the draft delegated act on access to online platform data for vetted researchers under the Digital Services Act (DSA).

text Digital Services Act inside a white triangle against a blue background

With the Digital Services Act, researchers will for the first time have access to data to study systemic risks and to assess online platforms’ risk mitigation measures in the EU. It will allow the research community to play a vital role in scrutinising and safeguarding the online environment.

The draft delegated act clarifies the procedures on how researchers can access Very Large Operating Platforms’ and Search Engines’ data. It also sets out rules on data formats and data documentation requirements. Lastly, it establishes the DSA data access portal, a one-stop-shop for researchers, data providers, and DSCs to exchange information on data access requests. The consultation follows a first call for evidence.

The consultation will run until 26 November 2024. After gathering public feedback, the Commission plans to adopt the rules in the first quarter of 2025…(More)”.

Science and technology’s contribution to the UK economy


UK House of Lords Primer: “It is difficult to accurately pinpoint the economic contribution of science and technology to the UK economy. This is because of the way sectors are divided up and reported in financial statistics. 

 For example, in September 2024 the Office for National Statistics (ONS) reported the following gross value added (GVA) figures by industry/sector for 2023:

  • £71bn for IT and other information service activities 
  • £20.6bn for scientific research and development 

This would amount to £91.6bn, forming approximately 3.9% of the total UK GVA of £2,368.7bn for 2023. However, a number of other sectors could also be included in these figures, for example: 

  • the manufacture of computer, certain machinery and electrical components (valued at £38bn in 2023) 
  • telecommunications (valued at £34.5bn) 

If these two sectors were included too, GVA across all four sectors would total £164.1bn, approximately 6.9% of the UK’s 2023 GVA. However, this would likely still exclude relevant contributions that happen to fall within the definitions of different industries. For example, the manufacture of spacecraft and related machinery falls within the same sector as the manufacture of aircraft in the ONS’s data (this sector was valued at £10.8bn for 2023).  

Alternatively, others have made estimates of the economic contribution of more specific sectors connected to science and technology. For example: 

  • Oxford Economics, an economic advisory firm, has estimated that, in 2023, the life sciences sector contributed over £13bn to the UK economy and employed one in every 121 employed people 
  • the government has estimated the value of the digital sector (comprising information technology and digital content and media) at £158.3bn for 2022
  • a 2023 government report estimated the value of the UK’s artificial intelligence (AI) sector at around £3.7bn (in terms of GVA) and that the sector employed around 50,040 people
  • the Energy and Climate Intelligence Unit, a non-profit organisation, reported estimates that the GVA of the UK’s net zero economy (encompassing sectors such as renewables, carbon capture, green and certain manufacturing) was £74bn in 2022/23 and that it supported approximately 765,700 full-time equivalent (FTE) jobs…(More)”.

Veridical Data Science


Book by Bin Yu and Rebecca L. Barter: “Most textbooks present data science as a linear analytic process involving a set of statistical and computational techniques without accounting for the challenges intrinsic to real-world applications. Veridical Data Science, by contrast, embraces the reality that most projects begin with an ambiguous domain question and messy data; it acknowledges that datasets are mere approximations of reality while analyses are mental constructs.
Bin Yu and Rebecca Barter employ the innovative Predictability, Computability, and Stability (PCS) framework to assess the trustworthiness and relevance of data-driven results relative to three sources of uncertainty that arise throughout the data science life cycle: the human decisions and judgment calls made during data collection, cleaning, and modeling. By providing real-world data case studies, intuitive explanations of common statistical and machine learning techniques, and supplementary R and Python code, Veridical Data Science offers a clear and actionable guide for conducting responsible data science. Requiring little background knowledge, this lucid, self-contained textbook provides a solid foundation and principled framework for future study of advanced methods in machine learning, statistics, and data science…(More)”.

Statistical Significance—and Why It Matters for Parenting


Blog by Emily Oster: “…When we say an effect is “statistically significant at the 5% level,” what this means is that there is less than a 5% chance that we’d see an effect of this size if the true effect were zero. (The “5% level” is a common cutoff, but things can be significant at the 1% or 10% level also.) 

The natural follow-up question is: Why would any effect we see occur by chance? The answer lies in the fact that data is “noisy”: it comes with error. To see this a bit more, we can think about what would happen if we studied a setting where we know our true effect is zero. 

My fake study 

Imagine the following (fake) study. Participants are randomly assigned to eat a package of either blue or green M&Ms, and then they flip a (fair) coin and you see if it is heads. Your analysis will compare the number of heads that people flip after eating blue versus green M&Ms and report whether this is “statistically significant at the 5% level.”…(More)”.

External Researcher Access to Closed Foundation Models


Report by Esme Harrington and Dr. Mathias Vermeulen: “…addresses a pressing issue: independent researchers need better conditions for accessing and studying the AI models that big companies have developed. Foundation models — the core technology behind many AI applications — are controlled mainly by a few major players who decide who can study or use them.

What’s the problem with access?

  • Limited access: Companies like OpenAI, Google and others are the gatekeepers. They often restrict access to researchers whose work aligns with their priorities, which means independent, public-interest research can be left out in the cold.
  • High-end costs: Even when access is granted, it often comes with a hefty price tag that smaller or less-funded teams can’t afford.
  • Lack of transparency: These companies don’t always share how their models are updated or moderated, making it nearly impossible for researchers to replicate studies or fully understand the technology.
  • Legal risks: When researchers try to scrutinize these models, they sometimes face legal threats if their work uncovers flaws or vulnerabilities in the AI systems.

The research suggests that companies need to offer more affordable and transparent access to improve AI research. Additionally, governments should provide legal protections for researchers, especially when they are acting in the public interest by investigating potential risks…(More)”.

Key lesson of this year’s Nobel Prize: The importance of unlocking data responsibly to advance science and improve people’s lives


Article by Stefaan Verhulst, Anna Colom, and Marta Poblet: “This year’s Nobel Prize for Chemistry owes a lot to available, standardised, high quality data that can be reused to improve people’s lives. The winners, Prof David Baker from the University of Washington, and Demis Hassabis and John M. Jumper from Google DeepMind, were awarded respectively for the development and prediction of new proteins that can have important medical applications. These developments build on AI models that can predict protein structures in unprecedented ways. However, key to these models and their potential to unlock health discoveries is an open curated dataset with high quality and standardised data, something still rare despite the pace and scale of AI-driven development.

We live in a paradoxical time of both data abundance and data scarcity: a lot of data is being created and stored, but it tends to be inaccessible due to private interests and weak regulations. The challenge, then, is to prevent the misuse of data whilst avoiding its missed use.

The reuse of data remains limited in Europe, but a new set of regulations seeks to increase the possibilities of responsible data reuse. When the European Commission made the case for its European Data Strategy in 2020, it envisaged the European Union “a role model for a society empowered by data to make better decisions — in business and the public sector,” and acknowledged the need to improve “governance structures for handling data and to increase its pools of quality data available for use and reuse”…(More)”.

WikiProject AI Cleanup


Article by Emanuel Maiberg: “A group of Wikipedia editors have formed WikiProject AI Cleanup, “a collaboration to combat the increasing problem of unsourced, poorly-written AI-generated content on Wikipedia.”

The group’s goal is to protect one of the world’s largest repositories of information from the same kind of misleading AI-generated information that has plagued Google search resultsbooks sold on Amazon, and academic journals.

“A few of us had noticed the prevalence of unnatural writing that showed clear signs of being AI-generated, and we managed to replicate similar ‘styles’ using ChatGPT,” Ilyas Lebleu, a founding member of WikiProject AI Cleanup, told me in an email. “Discovering some common AI catchphrases allowed us to quickly spot some of the most egregious examples of generated articles, which we quickly wanted to formalize into an organized project to compile our findings and techniques.”…(More)”.

Data’s Role in Unlocking Scientific Potential


Report by the Special Competitive Studies Project: “…we outline two actionable steps the U.S. government can take immediately to address the data sharing challenges hindering scientific research.

1. Create Comprehensive Data Inventories Across Scientific Domains

We recommend the Secretary of Commerce, acting through the Department of Commerce’s Chief Data Officer and the Director of the National Institute of Standards and Technology (NIST), and with the Federal Chief Data Officer Council (CDO Council) create a government-led inventory where organizations – universities, industries, and research institutes – can catalog their datasets with key details like purpose, description, and accreditation. Similar to platforms like data.gov, this centralized repository would make high-quality data more visible and accessible, promoting scientific collaboration. To boost participation, the government could offer incentives, such as grants or citation credits for researchers whose data is used. Contributing organizations would also be responsible for regularly updating their entries, ensuring the data stays relevant and searchable. 

2. Create Scientific Data Sharing Public-Private Partnerships

A critical recommendation of the National Data Action Plan was for the United States to facilitate the creation of data sharing public-private partnerships for specific sectors. The U.S. Government should coordinate data sharing partnerships with its departments and agencies, industry, academia, and civil society. Data collected by one entity can be tremendously valuable to others. But incentivizing data sharing is challenging as privacy, security, legal (e.g., liability), and intellectual property (IP) concerns can limit willingness to share. However, narrowly-scoped PPPs can help overcome these barriers, allowing for greater data sharing and mutually beneficial data use…(More)”

AI-accelerated Nazca survey nearly doubles the number of known figurative geoglyphs and sheds light on their purpose


Paper by Masato Sakai, Akihisa Sakurai, Siyuan Lu, and Marcus Freitag: “It took nearly a century to discover a total of 430 figurative Nazca geoglyphs, which offer significant insights into the ancient cultures at the Nazca Pampa. Here, we report the deployment of an AI system to the entire Nazca region, a UNESCO World Heritage site, leading to the discovery of 303 new figurative geoglyphs within only 6 mo of field survey, nearly doubling the number of known figurative geoglyphs. Even with limited training examples, the developed AI approach is demonstrated to be effective in detecting the smaller relief-type geoglyphs, which unlike the giant line-type geoglyphs are very difficult to discern. The improved account of figurative geoglyphs enables us to analyze their motifs and distribution across the Nazca Pampa. We find that relief-type geoglyphs depict mainly human motifs or motifs of things modified by humans, such as domesticated animals and decapitated heads (81.6%). They are typically located within viewing distance (on average 43 m) of ancient trails that crisscross the Nazca Pampa and were most likely built and viewed at the individual or small-group level. On the other hand, the giant line-type figurative geoglyphs mainly depict wild animals (64%). They are found an average of 34 m from the elaborate linear/trapezoidal network of geoglyphs, which suggests that they were probably built and used on a community level for ritual activities…(More)”

Buried Academic Treasures


Barrett and Greene: “…one of the presenters who said: “We have lots of research that leads to no results.”

As some of you know, we’ve written a book with Don Kettl to help academically trained researchers write in a way that would be understandable by decision makers who could make use of their findings. But the keys to writing well are only a small part of the picture. Elected and appointed officials have the capacity to ignore nearly anything, no matter how well written it is.

This is more than just a frustration to researchers, it’s a gigantic loss to the world of public administration. We spend lots of time reading through reports and frequently come across nuggets of insights that we believe could help make improvements in nearly every public sector endeavor from human resources to budgeting to performance management to procurement and on and on. We, and others, can do our best to get attention for this kind of information, but that doesn’t mean that the decision makers have the time or the inclination to take steps toward taking advantage of great ideas.

We don’t want to place the blame for the disconnect between academia and practitioners on either party. To one degree or the other they’re both at fault, with taxpayers and the people who rely on government services – and that’s pretty much everybody except for people who have gone off the grid – as the losers.

Following, from our experience, are six reasons we believe that it’s difficult to close the gap between the world of research and the realm of utility. The first three are aimed at government leaders, the last three have academics in mind…(More)”