science

Researchers’ access to information from regulated online services

Curated on July 8, 2025July 9, 2025 by Stefaan Verhulst

Report by Ofcom (UK): “…We outline three potential policy options and models for facilitating greater researcher access, which include:

Clarify existing legal rules: Relevant authorities, could provide additional guidance on what is already legally permitted for researcher access on important issues, such as data donations and research-related scraping.
Create new duties, enforced by a backstop regulator: Services could be required to put in place systems and processes to operationalise data access. This could include new duties on regulated services to create standard procedures for researcher accreditation. Services would be responsible for providing researchers with data directly or providing the interface through which they can access it and offering appeal and redress mechanisms. A backstop regulator could enforce these duties – either an existing or new body.
Enable and manage access via independent intermediary: New legal powers could be granted to a trusted third party which would facilitate and manage researchers’ access to data. This intermediary – which could again be an existing or new body – would accredit researchers and provide secure access.

Our report describes three types of intermediary that could be considered – direct access intermediary, notice to service intermediary and repository intermediary models.

Direct access intermediary. Researchers could request data with an intermediary facilitating secure access. In this model, services could retain responsibility for hosting and providing data while the intermediary maintains the interface by which researchers request access.
Notice to service intermediary. Researchers could apply for accreditation and request access to specific datasets via the intermediary. This could include data that would not be accessible in direct access models. The intermediary would review and refuse or approve access. Services would then be required to provide access to the approved data.
Repository intermediary. The intermediary could itself provide direct access to data, by providing an interface for data access and/or hosting the data itself and taking responsibility for data governance. This could also include data that would not be accessible in direct access models…(More)”.

Sudden loss of key US satellite data could send hurricane forecasting back ‘decades’

Curated on July 8, 2025July 8, 2025 by Stefaan Verhulst

Article by Eric Holthaus: “A critical US atmospheric data collection program will be halted by Monday, giving weather forecasters just days to prepare, according to a public notice sent this week. Scientists that the Guardian spoke with say the change could set hurricane forecasting back “decades”, just as this year’s season ramps up.

In a National Oceanic and Atmospheric Administration (Noaa) message sent on Wednesday to its scientists, the agency said that “due to recent service changes” the Defense Meteorological Satellite Program (DMSP) will “discontinue ingest, processing and distribution of all DMSP data no later than June 30, 2025”.

Due to their unique characteristics and ability to map the entire world twice a day with extremely high resolution, the three DMSP satellites are a primary source of information for scientists to monitor Arctic sea ice and hurricane development. The DMSP partners with Noaa to make weather data collected from the satellites publicly available.

The reasons for the changes, and which agency was driving them, were not immediately clear. Noaa said they would not affect the quality of forecasting.

However, the Guardian spoke with several scientists inside and outside of the US government whose work depends on the DMSP, and all said there are no other US programs that can form an adequate replacement for its data.

“We’re a bit blind now,” said Allison Wing, a hurricane researcher at Florida State University. Wing said the DMSP satellites are the only ones that let scientists see inside the clouds of developing hurricanes, giving them a critical edge in forecasting that now may be jeopardized.

“Before these types of satellites were present, there would often be situations where you’d wake up in the morning and have a big surprise about what the hurricane looked like,” said Wing. “Given increases in hurricane intensity and increasing prevalence towards rapid intensification in recent years, it’s not a good time to have less information.”..(More)”.

Will AI speed up literature reviews or derail them entirely?

Curated on July 8, 2025July 8, 2025 by Stefaan Verhulst

Article by Sam A. Reynolds: “Over the past few decades, evidence synthesis has greatly increased the effectiveness of medicine and other fields. The process of systematically combining findings from multiple studies into comprehensive reviews helps researchers and policymakers to draw insights from the global literature¹. AI promises to speed up parts of the process, including searching and filtering. It could also help researchers to detect problematic papers². But in our view, other potential uses of AI mean that many of the approaches being developed won’t be sufficient to ensure that evidence syntheses remain reliable and responsive. In fact, we are concerned that the deployment of AI to generate fake papers presents an existential crisis for the field.

What’s needed is a radically different approach — one that can respond to the updating and retracting of papers over time.

We propose a network of continually updated evidence databases, hosted by diverse institutions as ‘living’ collections. AI could be used to help build the databases. And each database would hold findings relevant to a broad theme or subject, providing a resource for an unlimited number of ultra-rapid and robust individual reviews…

Currently, the gold standard for evidence synthesis is the systematic review. These are comprehensive, rigorous, transparent and objective, and aim to include as much relevant high-quality evidence as possible. They also use the best methods available for reducing bias. In part, this is achieved by getting multiple reviewers to screen the studies; declaring whatever criteria, databases, search terms and so on are used; and detailing any conflicts of interest or potential cognitive biases…(More)”.

This new cruise-ship activity is surprisingly popular

Curated on July 6, 2025July 8, 2025 by Stefaan Verhulst

Article by Brian Johnston: “Scientists are always short of research funds, but the boom in the popularity of expedition cruising has given them an unexpected opportunity to access remote places.

Instead of making single, expensive visits to Antarctica, for example, scientists hitch rides on cruise ships that make repeat visits and provide the opportunity for data collection over an entire season.

Meanwhile, cruise passengers’ willingness to get involved in a “citizen science” capacity is proving invaluable for crowdsourcing data on everything from whale migration and microplastics to seabird populations. And it isn’t only the scientists who benefit. Guests get a better insight into the environments in which they sail, and feel that they’re doing their bit to understand and preserve the wildlife and landscapes around them.

Citizen-science projects produce tangible results, among them that ships in Antarctica now sail under 10 knots after a study showed that, at that speed, whales have a far greater chance of avoiding or surviving ship strikes. In 2023 Viking Cruises encountered rare giant phantom jellyfish in Antarctica, and in 2024 discovered a new chinstrap penguin colony near Antarctica’s Astrolabe Island.

Viking’s expedition ships have a Science Lab and the company works with prestigious partners such as the Cornell Lab of Ornithology and Norwegian Polar Institute. Expedition lines with visiting scientist programs include Chimu Adventures, Lindblad Expeditions and Quark Expeditions, which works with Penguin Watch to study the impact of avian flu…(More)”.

Commission facilitates data access for researchers under the Digital Services Act

Curated on July 2, 2025July 3, 2025 by Stefaan Verhulst

Press Release: “On 2 July 2025, the Commission published a delegated act outlining rules granting access to data for qualified researchers under the Digital Services Act (DSA). This delegated act enables access to the internal data of very large online platforms (VLOPs) and search engines (VLOSEs) to research the systemic risks and on the mitigation measures in the European Union.

The delegated act on data access clarifies the procedures for VLOPs and VLOSEs to share data with vetted researchers, including data formats and requirements for data documentation. Moreover, the delegated act sets out which information Digital Services Coordinators (DSCs), VLOPs and VLOSEs must make public to facilitate vetted researchers’ applications to access relevant datasets.

With the adoption of the delegated act, the Commission will launch the DSA data access portal where researchers interested in accessing data under the new mechanism can find information and exchange with VLOPs, VLOSEs and DSCs on their data access applications.

Before accessing internal data, researchers must be vetted by a DSC.

For this vetting process, researchers must submit a data access application demonstrating their affiliation to a research organisation, their independence from commercial interests, and their ability to manage the requested data in line with security, confidentiality and privacy rules. In addition, researchers need to disclose the funding of the research project for which the data is requested and commit to publishing the results of their research. Only data that is necessary to perform research on systemic risks in the EU can be requested.

To complement the rules in the delegated act, on 27 June 2025 the Board of Digital Services endorsed a proposal for further cooperation among DSCs in the vetting process of researchers…(More)”.

Digital Methods: A Short Introduction

Curated on June 29, 2025June 29, 2025 by Stefaan Verhulst

Book by Tommaso Venturini and Richard Rogers: “In a direct and accessible way, the authors provide hands-on advice to equip readers with the knowledge they need to understand which digital methods are best suited to their research goals and how to use them. Cutting through theoretical and technical complications, they focus on the different practices associated with digital methods to skillfully provide a quick-start guide to the art of querying, prompting, API calling, scraping, mining, wrangling, visualizing, crawling, plotting networks, and scripting. While embracing the capacity of digital methods to rekindle sociological imagination, this book also delves into their limits and biases and reveals the hard labor of digital fieldwork. The book also touches upon the epistemic and political consequences of these methods, but with the purpose of providing practical advice for their usage…(More)”.

What Counts as Discovery?

Curated on June 29, 2025June 29, 2025 by Stefaan Verhulst

Essay by Nisheeth Vishnoi: “Long before there were “scientists,” there was science. Across every continent, humans developed knowledge systems grounded in experience, abstraction, and prediction—driven not merely by curiosity, but by a desire to transform patterns into principles, and observation into discovery. Farmers tracked solstices, sailors read stars, artisans perfected metallurgy, and physicians documented plant remedies. They built calendars, mapped cycles, and tested interventions—turning empirical insight into reliable knowledge.

From the oral sciences of Africa, which encoded botanical, medical, and ecological knowledge across generations, to the astronomical observatories of Mesoamerica, where priests tracked solstices, eclipses, and planetary motion with remarkable accuracy, early human civilizations sought more than survival. In Babylon, scribes logged celestial movements and built predictive models; in India, the architects of Vedic altars designed ritual structures whose proportions mirrored cosmic rhythms, embedding arithmetic and geometry into sacred form. Across these diverse cultures, discovery was not a separate enterprise—it was entwined with ritual, survival, and meaning. Yet the tools were recognizably scientific: systematic observation, abstraction, and the search for hidden order.

This was science before the name. And it reminds us that discovery has never belonged to any one civilization or era. Discovery is not intelligence itself, but one of its sharpest expressions—an act that turns perception into principle through a conceptual leap. While intelligence is broader and encompasses adaptation, inference, and learning in various forms (biological, cultural, and even mechanical), discovery marks those moments when something new is framed, not just found.

Life forms learn, adapt, and even innovate. But it is humans who turned observation into explanation, explanation into abstraction, and abstraction into method. The rise of formal science brought mathematical structure and experiment, but it did not invent the impulse to understand—it gave it form, language, and reach.

And today, we stand at the edge of something unfamiliar: the possibility of lifeless discoveries. Artificial Intelligence machines, built without awareness or curiosity, are beginning to surface patterns and propose explanations, sometimes without our full understanding. If science has long been a dialogue between the world and living minds, we are now entering a strange new phase: abstraction without awareness, discovery without a discoverer.

AI systems now assist in everything from understanding black holes to predicting protein folds and even symbolic equation discovery. They parse vast datasets, detect regularities, and generate increasingly sophisticated outputs. Some claim they’re not just accelerating research, but beginning to reshape science itself—perhaps even to discover.

But what truly counts as a scientific discovery? This essay examines that question…(More)”

Library Catalogues as Data: Research, Practice and Usage

Curated on June 23, 2025June 23, 2025 by Stefaan Verhulst

Book by Paul Gooding, Melissa Terras, and Sarah Ames: “Through the web of library catalogues, library management systems and myriad digital resources, libraries have become repositories not only for physical and digital information resources but also for enormous amounts of data about the interactions between these resources and their users. Bringing together leading practitioners and academic voices, this book considers library catalogue data as a vital research resource.

Divided into four sections, each approaches library catalogues, collections and records from a different angle, from exploring methods for examining such data; to the politics of catalogues and library data; their interdisciplinary potential; and practical uses and applications of catalogues as data. Other topics the volume discusses include:

Practical routes to preparing library catalogue data for researchers
The ethics of library metadata privacy and reuse
Data-driven decision making
Data quality and collections bias
Preserving, resurrecting and restoring data
The uses and potential of historical library data
The intersection of catalogue data, AI and Large Language Models (LLMs)

This comprehensive book will be an essential read for practitioners in the GLAM sector, particularly those dealing with collections and catalogue data, and LIS academics and students…(More)”

Sharing trustworthy AI models with privacy-enhancing technologies

Curated on June 17, 2025June 17, 2025 by Stefaan Verhulst

OECD Report: “Privacy-enhancing technologies (PETs) are critical tools for building trust in the collaborative development and sharing of artificial intelligence (AI) models while protecting privacy, intellectual property, and sensitive information. This report identifies two key types of PET use cases. The first is enhancing the performance of AI models through confidential and minimal use of input data, with technologies like trusted execution environments, federated learning, and secure multi-party computation. The second is enabling the confidential co-creation and sharing of AI models using tools such as differential privacy, trusted execution environments, and homomorphic encryption. PETs can reduce the need for additional data collection, facilitate data-sharing partnerships, and help address risks in AI governance. However, they are not silver bullets. While combining different PETs can help compensate for their individual limitations, balancing utility, efficiency, and usability remains challenging. Governments and regulators can encourage PET adoption through policies, including guidance, regulatory sandboxes, and R&D support, which would help build sustainable PET markets and promote trustworthy AI innovation…(More)”.

A New Paradigm for Fueling AI for the Public Good

Curated on June 16, 2025June 19, 2025 by Stefaan Verhulst

Article by Kevin T. Frazier: “Imagine receiving this email in the near future: “Thank you for sharing data with the American Data Collective on May 22, 2025. After first sharing your workout data with SprintAI, a local startup focused on designing shoes for differently abled athletes, your data donation was also sent to an artificial intelligence research cluster hosted by a regional university. Your donation is on its way to accelerate artificial intelligence innovation and support researchers and innovators addressing pressing public needs!”

That is exactly the sort of message you could expect to receive if we made donations of personal data akin to blood donations—a pro-social behavior that may not immediately serve a donor’s individual needs but may nevertheless benefit the whole of the community. This vision of a future where data flow toward the public good is not science fiction—it is a tangible possibility if we address a critical bottleneck faced by innovators today.

Creating the data equivalent of blood banks may not seem like a pressing need or something that people should voluntarily contribute to, given widespread concerns about a few large artificial intelligence (AI) companies using data for profit-driven and, arguably, socially harmful ends. This narrow conception of the AI ecosystem fails to consider the hundreds of AI research initiatives and startups that have a desperate need for high-quality data. I was fortunate enough to meet leaders of those nascent AI efforts at Meta’s Open Source AI Summit in Austin, Texas. For example, I met with Matt Schwartz, who leads a startup that leans on AI to glean more diagnostic information from colonoscopies. I also connected with Edward Chang, a professor of neurological surgery at the University of California, San Francisco Weill Institute for Neurosciences, who relies on AI tools to discover new information on how and why our brains work. I also got to know Corin Wagen, whose startup is helping companies “find better molecules faster.” This is a small sample of the people leveraging AI for objectively good outcomes. They need your help. More specifically, they need your data.

A tragic irony shapes our current data infrastructure. Most of us share mountains of data with massive and profitable private parties—smartwatch companies, diet apps, game developers, and social media companies. Yet, AI labs, academic researchers, and public interest organizations best positioned to leverage our data for the common good are often those facing the most formidable barriers to acquiring the necessary quantity, quality, and diversity of data. Unlike OpenAI, they are not going to use bots to scrape the internet for data. Unlike Google and Meta, they cannot rely on their own social media platforms and search engines to act as perpetual data generators. And, unlike Anthropic, they lack the funds to license data from media outlets. So, while commercial entities amass vast datasets, frequently as a byproduct of consumer services and proprietary data acquisition strategies, mission-driven AI initiatives dedicated to public problems find themselves in a state of chronic data scarcity. This is not merely a hurdle—it is a systemic bottleneck choking off innovation where society needs it most, delaying or even preventing the development of AI tools that could significantly improve lives.

Individuals are, quite rightly, increasingly hesitant to share their personal information, with concerns about privacy, security, and potential misuse being both rampant and frequently justified by past breaches and opaque practices. Yet, in a striking contradiction, troves of deeply personal data are continuously siphoned by app developers, by tech platforms, and, often opaquely, by an extensive network of data brokers. This practice often occurs with minimal transparency and without informed consent concerning the full lifecycle and downstream uses of that data. This lack of transparency extends to how algorithms trained on this data make decisions that can impact individuals’ lives—from loan applications to job prospects—often without clear avenues for recourse or understanding, potentially perpetuating existing societal biases embedded in historical data…(More)”.