Paper by Thomas H. Li, and Francisco Barreras: “Human mobility datasets have seen increasing adoption in the past decade, enabling diverse applications that leverage the high precision of measured trajectories relative to other human mobility datasets. However, there are concerns about whether the high sparsity in some commercial datasets can introduce errors due to lack of robustness in processing algorithms, which could compromise the validity of downstream results. The scarcity of “ground-truth” data makes it particularly challenging to evaluate and calibrate these algorithms. To overcome these limitations and allow for an intermediate form of validation of common processing algorithms, we propose a synthetic trajectory simulator and sandbox environment meant to replicate the features of commercial datasets that could cause errors in such algorithms, and which can be used to compare algorithm outputs with “ground-truth” synthetic trajectories and mobility diaries. Our code is open-source and is publicly available alongside tutorial notebooks and sample datasets generated with it….(More)”
National biodiversity data infrastructures: ten essential functions for science, policy, and practice
Paper by Anton Güntsch et al: “Today, at the international level, powerful data portals are available to biodiversity researchers and policymakers, offering increasingly robust computing and network capacities and capable data services for internationally agreed-on standards. These accelerate individual and complex workflows to map data-driven research processes or even to make them possible for the first time. At the national level, however, and alongside these international developments, national infrastructures are needed to take on tasks that cannot be easily funded or addressed internationally. To avoid gaps, as well as redundancies in the research landscape, national tasks and responsibilities must be clearly defined to align efforts with core priorities. In the present article, we outline 10 essential functions of national biodiversity data infrastructures. They serve as key providers, facilitators, mediators, and platforms for effective biodiversity data management, integration, and analysis that require national efforts to foster biodiversity science, policy, and practice…(More)”.
Access, Signal, Action: Data Stewardship Lessons from Valencia’s Floods
Article by Marta Poblet, Stefaan Verhulst, and Anna Colom: “Valencia has a rich history in water management, a legacy shaped by both triumphs and tragedies. This connection to water is embedded in the city’s identity, yet modern floods test its resilience in new ways.
During the recent floods, Valencians experienced a troubling paradox. In today’s connected world, digital information flows through traditional and social media, weather apps, and government alert systems designed to warn us of danger and guide rapid responses. Despite this abundance of data, a tragedy unfolded last month in Valencia. This raises a crucial question: how can we ensure access to the right data, filter it for critical signals, and transform those signals into timely, effective action?
Data stewardship becomes essential in this process.
In particular, the devastating floods in Valencia underscore the importance of:
- having access to data to strengthen the signal (first mile challenges)
- separating signal from noise
- translating signal into action (last mile challenges)…(More)”.
Beached Plastic Debris Index; a modern index for detecting plastics on beaches
Paper by Jenna Guffogg et al: “Plastic pollution on shorelines poses a significant threat to coastal ecosystems, underscoring the urgent need for scalable detection methods to facilitate debris removal. In this study, the Beached Plastic Debris Index (BPDI) was developed to detect plastic accumulation on beaches using shortwave infrared spectral features. To validate the BPDI, plastic targets with varying sub-pixel covers were placed on a sand spit and captured using WorldView-3 satellite imagery. The performance of the BPDI was analysed in comparison with the Normalized Difference Plastic Index (NDPI), the Plastic Index (PI), and two hydrocarbon indices (HI, HC). The BPDI successfully detected the plastic targets from sand, water, and vegetation, outperforming the other indices and identifying pixels with <30 % plastic cover. The robustness of the BPDI suggests its potential as an effective tool for mapping plastic debris accumulations along coastlines…(More)”.
Once Upon a Crime: Towards Crime Prediction from Demographics and Mobile Data
Paper by Andrey Bogomolov, Bruno Lepri, Jacopo Staiano, Nuria Oliver, Fabio Pianesi, and Alex Pentland: “In this paper, we present a novel approach to predict crime in a geographic space from multiple data sources, in particular mobile phone and demographic data. The main contribution of the proposed approach lies in using aggregated and anonymized human behavioral data derived from mobile network activity to tackle the crime prediction problem. While previous research efforts have used either background historical knowledge or offenders’ profiling, our findings support the hypothesis that aggregated human behavioral data captured from the mobile network infrastructure, in combination with basic demographic information, can be used to predict crime. In our experimental results with real crime data from London we obtain an accuracy of almost 70% when predicting whether a specific area in the city will be a crime hotspot or not. Moreover, we provide a discussion of the implications of our findings for data-driven crime analysis…(More)”.
Unlocking Green Deal Data: Innovative Approaches for Data Governance and Sharing in Europe
JRC Report: “Drawing upon the ambitious policy and legal framework outlined in the Europe Strategy for Data (2020) and the establishment of common European data spaces, this Science for Policy report explores innovative approaches for unlocking relevant data to achieve the objectives of the European Green Deal.
The report focuses on the governance and sharing of Green Deal data, analysing a variety of topics related to the implementation of new regulatory instruments, namely the Data Governance Act and the Data Act, as well as the roles of various actors in the data ecosystem. It provides an overview of the current incentives and disincentives for data sharing and explores the existing landscape of Data Intermediaries and Data Altruism Organizations. Additionally, it offers insights from a private sector perspective and outlines key data governance and sharing practices concerning Citizen-Generated Data (CGD).
The main conclusions build upon the concept of “Systemic Data Justice,” which emphasizes equity, accountability, and fair representation to foster stronger connections between the supply and demand of data for a more effective and sustainable data economy. Five policy recommendations outline a set of main implications and actionable points for the revision of the INSPIRE Directive (2007) within the context of the common European Green Deal data space, and toward a more sustainable and fair data ecosystem. However, the relevance of these recommendations spills over Green Deal data only, as they outline key elements to ensure that any data ecosystem is both just and impact-oriented…(More)”.
Effective Data Stewardship in Higher Education: Skills, Competences, and the Emerging Role of Open Data Stewards
Paper by Panos Fitsilis et al: “The significance of open data in higher education stems from the changing tendencies towards open science, and open research in higher education encourages new ways of making scientific inquiry more transparent, collaborative and accessible. This study focuses on the critical role of open data stewards in this transition, essential for managing and disseminating research data effectively in universities, while it also highlights the increasing demand for structured training and professional policies for data stewards in academic settings. Building upon this context, the paper investigates the essential skills and competences required for effective data stewardship in higher education institutions by elaborating on a critical literature review, coupled with practical engagement in open data stewardship at universities, provided insights into the roles and responsibilities of data stewards. In response to these identified needs, the paper proposes a structured training framework and comprehensive curriculum for data stewardship, a direct response to the gaps identified in the literature. It addresses five key competence categories for open data stewards, aligning them with current trends and essential skills and knowledge in the field. By advocating for a structured approach to data stewardship education, this work sets the foundation for improved data management in universities and serves as a critical step towards professionalizing the role of data stewards in higher education. The emphasis on the role of open data stewards is expected to advance data accessibility and sharing practices, fostering increased transparency, collaboration, and innovation in academic research. This approach contributes to the evolution of universities into open ecosystems, where there is free flow of data for global education and research advancement…(More)”.
Commission launches public consultation on the rules for researchers to access online platform data under the Digital Services Act
Press Release: “Today, the Commission launched a public consultation on the draft delegated act on access to online platform data for vetted researchers under the Digital Services Act (DSA).

With the Digital Services Act, researchers will for the first time have access to data to study systemic risks and to assess online platforms’ risk mitigation measures in the EU. It will allow the research community to play a vital role in scrutinising and safeguarding the online environment.
The draft delegated act clarifies the procedures on how researchers can access Very Large Operating Platforms’ and Search Engines’ data. It also sets out rules on data formats and data documentation requirements. Lastly, it establishes the DSA data access portal, a one-stop-shop for researchers, data providers, and DSCs to exchange information on data access requests. The consultation follows a first call for evidence.
The consultation will run until 26 November 2024. After gathering public feedback, the Commission plans to adopt the rules in the first quarter of 2025…(More)”.
Long-term validation of inner-urban mobility metrics derived from Twitter/X
Paper by Steffen Knoblauch et al: “Urban mobility analysis using Twitter as a proxy has gained significant attention in various application fields; however, long-term validation studies are scarce. This paper addresses this gap by assessing the reliability of Twitter data for modeling inner-urban mobility dynamics over a 27-month period in the. metropolitan area of Rio de Janeiro, Brazil. The evaluation involves the validation of Twitter-derived mobility estimates at both temporal and spatial scales, employing over 1.6 × 1011 mobile phone records of around three million users during the non-stationary mobility period from April 2020 to. June 2022, which coincided with the COVID-19 pandemic. The results highlight the need for caution when using Twitter for short-term modeling of urban mobility flows. Short-term inference can be influenced by Twitter policy changes and the availability of publicly accessible tweets. On the other hand, this long-term study demonstrates that employing multiple mobility metrics simultaneously, analyzing dynamic and static mobility changes concurrently, and employing robust preprocessing techniques such as rolling window downsampling can enhance the inference capabilities of Twitter data. These novel insights gained from a long-term perspective are vital, as Twitter – rebranded to X in 2023 – is extensively used by researchers worldwide to infer human movement patterns. Since conclusions drawn from studies using Twitter could be used to inform public policy, emergency response, and urban planning, evaluating the reliability of this data is of utmost importance…(More)”.
Key lesson of this year’s Nobel Prize: The importance of unlocking data responsibly to advance science and improve people’s lives
Article by Stefaan Verhulst, Anna Colom, and Marta Poblet: “This year’s Nobel Prize for Chemistry owes a lot to available, standardised, high quality data that can be reused to improve people’s lives. The winners, Prof David Baker from the University of Washington, and Demis Hassabis and John M. Jumper from Google DeepMind, were awarded respectively for the development and prediction of new proteins that can have important medical applications. These developments build on AI models that can predict protein structures in unprecedented ways. However, key to these models and their potential to unlock health discoveries is an open curated dataset with high quality and standardised data, something still rare despite the pace and scale of AI-driven development.
We live in a paradoxical time of both data abundance and data scarcity: a lot of data is being created and stored, but it tends to be inaccessible due to private interests and weak regulations. The challenge, then, is to prevent the misuse of data whilst avoiding its missed use.
The reuse of data remains limited in Europe, but a new set of regulations seeks to increase the possibilities of responsible data reuse. When the European Commission made the case for its European Data Strategy in 2020, it envisaged the European Union “a role model for a society empowered by data to make better decisions — in business and the public sector,” and acknowledged the need to improve “governance structures for handling data and to increase its pools of quality data available for use and reuse”…(More)”.