Long-term validation of inner-urban mobility metrics derived from Twitter/X


Paper by Steffen Knoblauch et al: “Urban mobility analysis using Twitter as a proxy has gained significant attention in various application fields; however, long-term validation studies are scarce. This paper addresses this gap by assessing the reliability of Twitter data for modeling inner-urban mobility dynamics over a 27-month period in the. metropolitan area of Rio de Janeiro, Brazil. The evaluation involves the validation of Twitter-derived mobility estimates at both temporal and spatial scales, employing over 1.6 × 1011 mobile phone records of around three million users during the non-stationary mobility period from April 2020 to. June 2022, which coincided with the COVID-19 pandemic. The results highlight the need for caution when using Twitter for short-term modeling of urban mobility flows. Short-term inference can be influenced by Twitter policy changes and the availability of publicly accessible tweets. On the other hand, this long-term study demonstrates that employing multiple mobility metrics simultaneously, analyzing dynamic and static mobility changes concurrently, and employing robust preprocessing techniques such as rolling window downsampling can enhance the inference capabilities of Twitter data. These novel insights gained from a long-term perspective are vital, as Twitter – rebranded to X in 2023 – is extensively used by researchers worldwide to infer human movement patterns. Since conclusions drawn from studies using Twitter could be used to inform public policy, emergency response, and urban planning, evaluating the reliability of this data is of utmost importance…(More)”.

Key lesson of this year’s Nobel Prize: The importance of unlocking data responsibly to advance science and improve people’s lives


Article by Stefaan Verhulst, Anna Colom, and Marta Poblet: “This year’s Nobel Prize for Chemistry owes a lot to available, standardised, high quality data that can be reused to improve people’s lives. The winners, Prof David Baker from the University of Washington, and Demis Hassabis and John M. Jumper from Google DeepMind, were awarded respectively for the development and prediction of new proteins that can have important medical applications. These developments build on AI models that can predict protein structures in unprecedented ways. However, key to these models and their potential to unlock health discoveries is an open curated dataset with high quality and standardised data, something still rare despite the pace and scale of AI-driven development.

We live in a paradoxical time of both data abundance and data scarcity: a lot of data is being created and stored, but it tends to be inaccessible due to private interests and weak regulations. The challenge, then, is to prevent the misuse of data whilst avoiding its missed use.

The reuse of data remains limited in Europe, but a new set of regulations seeks to increase the possibilities of responsible data reuse. When the European Commission made the case for its European Data Strategy in 2020, it envisaged the European Union “a role model for a society empowered by data to make better decisions — in business and the public sector,” and acknowledged the need to improve “governance structures for handling data and to increase its pools of quality data available for use and reuse”…(More)”.

Scientists around the world call to protect research on one of humanity’s greatest short-term threats – Disinformation


Forum on Democracy and Information: “At a critical time for understanding digital communications’ impact on societies, research on disinformation is endangered. 

In August, researchers around the world bid farewell to CrowdTangle – the Meta-owned social media monitoring tool. The decision by Meta to close the number one platform used to track mis- and disinformation, in what is a major election year, only to present its alternative tool Meta Content Library and API, has been met with a barrage of criticism.

If, as suggested by the World Economic Forum’s 2024 global risk report, disinformation is one of the biggest short-term threats to humanity, our collective ability to understand how it spreads and impacts our society is crucial. Just as we would not impede scientific research into the spread of viruses and disease, nor into natural ecosystems or other historical and social sciences, disinformation research must be permitted to be carried out unimpeded and with access to information needed to understand its complexity. Understanding the political economy of disinformation as well as its technological dimensions is also a matter of public health, democratic resilience, and national security.

By directly affecting the research community’s ability to open social media black boxes, this radical decision will also, in turn, hamper public understanding of how technology affects democracy. Public interest scrutiny is also essential for the next era of technology, notably for the world’s largest AI systems, which are similarly proprietary and opaque. The research community is already calling on AI companies to learn from the mistakes of social media and guarantee protections for good faith research. The solution falls on multiple shoulders and the global scientific community, civil society, public institutions and philanthropies must come together to meaningfully foster and protect public interest research on information and democracy…(More)”.

Harnessing digital footprint data for population health: a discussion on collaboration, challenges and opportunities in the UK


Paper by Romana Burgess et al: “Digital footprint data are inspiring a new era in population health and well-being research. Linking these novel data with other datasets is critical for future research wishing to use these data for the public good. In order to succeed, successful collaboration among industry, academics and policy-makers is vital. Therefore, we discuss the benefits and obstacles for these stakeholder groups in using digital footprint data for research in the UK. We advocate for policy-makers’ inclusion in research efforts, stress the exceptional potential of digital footprint research to impact policy-making and explore the role of industry as data providers, with a focus on shared value, commercial sensitivity, resource requirements and streamlined processes. We underscore the importance of multidisciplinary approaches, consumer trust and ethical considerations in navigating methodological challenges and further call for increased public engagement to enhance societal acceptability. Finally, we discuss how to overcome methodological challenges, such as reproducibility and sharing of learnings, in future collaborations. By adopting a multiperspective approach to outlining the challenges of working with digital footprint data, our contribution helps to ensure that future research can navigate these challenges effectively while remaining reproducible, ethical and impactful…(More)”

G20 Compendium on Data Access and Sharing Across the Public Sector and with the Private Sector for Public Interest


OECD Report: “…presents practical examples from G20 members on data access and sharing, both across the public sector and between the public and private sectors in the public interest. The report supports G20 discussions on common opportunities, enablers and challenges to strengthen data access and sharing in the public sector, as well countries’ efforts and priorities in this policy area. It has been prepared by the OECD for the Brazilian G20 Presidency in co-ordination with the Ministry of Management and Innovation in Public Services, to inform the G20 Digital Economy Working Group at its September 2024 meeting…(More)”.

A Diamond in the Rough: How Energy Consumption Data Can Boost Artificial Intelligence Startups and Accelerate the Green Transition


Policy brief by David Osimo and Anna Pizzamiglio: “…explores how the reuse of energy consumption data can foster a dynamic cleantech ecosystem and contribute to achieving the goals of the European Green Deal. Drawing on insights from EDDIE, a decentralised platform that standardises data formats and enhances data management across Europe, the brief outlines five key recommendations for shifting from a focus on data regulation to fostering innovation. These recommendations include: Enhancing User Experience, Nurturing the Cleantech Ecosystem, Strengthening Data Stewardship, Clarifying GDPR Guidelines, Eliminating Barriers to the Single Market…(More)”.

The future of agricultural data-sharing policy in Europe: stakeholder insights on the EU Code of Conduct


Paper by Mark Ryan, Can Atik, Kelly Rijswijk, Marc-Jeroen Bogaardt, Eva Maes & Ella Deroo: “n 2018, the EU Code of Conduct of Agricultural Data Sharing by Contractual Agreement (EUCC) was published. This voluntary initiative is considered a basis for rights and responsibilities for data sharing in the agri-food sector, with a specific farmer orientation. While the involved industry associations agreed on its content, there are limited insights into how and to what extent the EUCC has been received and implemented within the sector. In 2024, the Data Act was introduced, a horizontal legal framework that aims to enforce specific legal requirements for data sharing across sectors. Yet, it remains to be seen if it will be the ultimate solution for the agricultural sector, as some significant agricultural data access issues remain. It is thus essential to determine if the EUCC may still play a significant role to address sector-specific issues in line with the horizontal rules of the Data Act. During six workshops across Europe with 89 stakeholders, we identified how the EUCC has been (1) received by stakeholders, (2) implemented, and (3) its future use (particularly in response to the Data Act). Based on the workshop results and continued engagements with researchers and stakeholders, we conclude that the EUCC is still an important document for the agricultural sector but should be updated in response to the content of the Data Act. Hence we propose the following improvements to the EUCC: 1. Provide clear, practical examples for applying the EUCC combined with the Data Act; 2. Generate model contractual terms based on the EUCC provisions; 3. Clarify GDPR-centric concepts like anonymisation and pseudonymisation in the agricultural data-sharing setting; 4. Develop a functional enforcement and implementation framework; and 5. Play a role in increasing interoperability and trust among stakeholders…(More)”

The Complexities of Differential Privacy for Survey Data


Paper by Jörg Drechsler & James Bailie: “The concept of differential privacy (DP) has gained substantial attention in recent years, most notably since the U.S. Census Bureau announced the adoption of the concept for its 2020 Decennial Census. However, despite its attractive theoretical properties, implementing DP in practice remains challenging, especially when it comes to survey data. In this paper we present some results from an ongoing project funded by the U.S. Census Bureau that is exploring the possibilities and limitations of DP for survey data. Specifically, we identify five aspects that need to be considered when adopting DP in the survey context: the multi-staged nature of data production; the limited privacy amplification from complex sampling designs; the implications of survey-weighted estimates; the weighting adjustments for nonresponse and other data deficiencies, and the imputation of missing values. We summarize the project’s key findings with respect to each of these aspects and also discuss some of the challenges that still need to be addressed before DP could become the new data protection standard at statistical agencies…(More)”.

Align or fail: How economics shape successful data sharing


Blog by Federico Bartolomucci: “…The conceptual distinctions between different data sharing models are mostly based on one fundamental element: the economic nature of data and its value. 

Open data projects operate under the assumption that data is a non-rival (i.e. can be used by multiple people at the same time) and a non-excludable asset (i.e. anyone can use it, similar to a public good like roads or the air we breathe). This means that data can be shared with everyone, for any use, without losing its market and competitive value. The Humanitarian Data Exchange platform is a great example that allows organizations to share over 19,000 open data sets on all aspects of humanitarian response with others.

Data collaboratives treat data as an excludable asset that some people may be excluded from accessing (i.e. a ‘club good’, like a movie theater) and therefore share it only among a restricted pool of actors. At the same time, they overcome the rival nature of this data set up by linking its use to a specific purpose. These work best by giving the actors a voice in choosing the purpose for which the data will be used, and through specific agreements and governance bodies that ensure that those contributing data will not have their competitive position harmed, therefore incentivizing them to engage. A good example of this is the California Data Collaborative, which uses data from different actors in the water sector to develop high-level analysis on water distribution to guide policy, planning, and operations for water districts in the state of California. 

Data ecosystems work by activating market mechanisms around data exchange to overcome reluctance to share data, rather than relying solely on its purpose of use. This means that actors can choose to share their data in exchange for compensation, be it monetary or in alternate forms such as other data. In this way, the compensation balances the potential loss of competitive advantage created by the sharing of a rival asset, as well as the costs and risks of sharing. The Enershare initiative aims to establish a marketplace utilizing blockchain and smart contracts to facilitate data exchange in the energy sector. The platform is based on a compensation system, which can be non-monetary, for exchanging assets and resources related to data (such as datasets, algorithms, and models) with energy assets and services (like heating system maintenance or the transfer of surplus locally self-produced energy).

These different models of data sharing have different operational implications…(More)”.

Private sector trust in data sharing: enablers in the European Union


Paper by Jaime Bernal: “Enabling private sector trust stands as a critical policy challenge for the success of the EU Data Governance Act and Data Act in promoting data sharing to address societal challenges. This paper attributes the widespread trust deficit to the unmanageable uncertainty that arises from businesses’ limited usage control to protect their interests in the face of unacceptable perceived risks. For example, a firm may hesitate to share its data with others in case it is leaked and falls into the hands of business competitors. To illustrate this impasse, competition, privacy, and reputational risks are introduced, respectively, in the context of three suboptimal approaches to data sharing: data marketplaces, data collaboratives, and data philanthropy. The paper proceeds by analyzing seven trust-enabling mechanisms comprised of technological, legal, and organizational elements to balance trust, risk, and control and assessing their capacity to operate in a fair, equitable, and transparent manner. Finally, the paper examines the regulatory context in the EU and the advantages and limitations of voluntary and mandatory data sharing, concluding that an approach that effectively balances the two should be pursued…(More)”.