Using Artificial Intelligence to Map the Earth’s Forests


Article from Meta and World Resources Institute: “Forests harbor most of Earth’s terrestrial biodiversity and play a critical role in the uptake of carbon dioxide from the atmosphere. Ecosystem services provided by forests underpin an essential defense against the climate and biodiversity crises. However, critical gaps remain in the scientific understanding of the structure and extent of global forests. Because the vast majority of existing data on global forests is derived from low to medium resolution satellite imagery (10 or 30 meters), there is a gap in the scientific understanding of dynamic and more dispersed forest systems such as agroforestry, drylands forests, and alpine forests, which together constitute more than a third of the world’s forests. 

Today, Meta and World Resources Institute are launching a global map of tree canopy height at a 1-meter resolution, allowing the detection of single trees at a global scale. In an effort to advance open source forest monitoring, all canopy height data and artificial intelligence models are free and publicly available…(More)”.

Citizen scientists—practices, observations, and experience


Paper by Michael O’Grady & Eleni Mangina: “Citizen science has been studied intensively in recent years. Nonetheless, the voice of citizen scientists is often lost despite their altruistic and indispensable role. To remedy this deficiency, a survey on the overall experiences of citizen scientists was undertaken. Dimensions investigated include activities, open science concepts, and data practices. However, the study prioritizes knowledge and practices of data and data management. When a broad understanding of data is lacking, the ability to make informed decisions about consent and data sharing, for example, is compromised. Furthermore, the potential and impact of individual endeavors and collaborative projects are reduced. Findings indicate that understanding of data management principles is limited. Furthermore, an unawareness of common data and open science concepts was observed. It is concluded that appropriate training and a raised awareness of Responsible Research and Innovation concepts would benefit individual citizen scientists, their projects, and society…(More)”.

Mechanisms for Researcher Access to Online Platform Data


Status Report by the EU/USA: “Academic and civil society research on prominent online platforms has become a crucial way to understand the information environment and its impact on our societies. Scholars across the globe have leveraged application programming interfaces (APIs) and web crawlers to collect public user-generated content and advertising content on online platforms to study societal issues ranging from technology-facilitated gender-based violence, to the impact of media on mental health for children and youth. Yet, a changing landscape of platforms’ data access mechanisms and policies has created uncertainty and difficulty for critical research projects.


The United States and the European Union have a shared commitment to advance data access for researchers, in line with the high-level principles on access to data from online platforms for researchers announced at the EU-U.S. Trade and Technology Council (TTC) Ministerial Meeting in May 2023.1 Since the launch of the TTC, the EU Digital Services Act (DSA) has gone into effect, requiring providers of Very Large Online Platforms (VLOPs) and Very Large Online Search Engines (VLOSEs) to provide increased transparency into their services. The DSA includes provisions on transparency reports, terms and conditions, and explanations for content moderation decisions. Among those, two provisions provide important access to publicly available content on platforms:


• DSA Article 40.12 requires providers of VLOPs/VLOSEs to provide academic and civil society researchers with data that is “publicly accessible in their online interface.”
• DSA Article 39 requires providers of VLOPs/VLOSEs to maintain a public repository of advertisements.

The announcements related to new researcher access mechanisms mark an important development and opportunity to better understand the information environment. This status report summarizes a subset of mechanisms that are available to European and/or United States researchers today, following, in part VLOPs and VLOSEs measures to comply with the DSA. The report aims at showcasing the existing access modalities and encouraging the use of these mechanisms to study the impact of online platform’s design and decisions on society. The list of mechanisms reviewed is included in the Appendix…(More)”

The Need for Climate Data Stewardship: 10 Tensions and Reflections regarding Climate Data Governance


Paper by Stefaan Verhulst: “Datafication — the increase in data generation and advancements in data analysis — offers new possibilities for governing and tackling worldwide challenges such as climate change. However, employing new data sources in policymaking carries various risks, such as exacerbating inequalities, introducing biases, and creating gaps in access. This paper articulates ten core tensions related to climate data and its implications for climate data governance, ranging from the diversity of data sources and stakeholders to issues of quality, access, and the balancing act between local needs and global imperatives. Through examining these tensions, the article advocates for a paradigm shift towards multi-stakeholder governance, data stewardship, and equitable data practices to harness the potential of climate data for public good. It underscores the critical role of data stewards in navigating these challenges, fostering a responsible data ecology, and ultimately contributing to a more sustainable and just approach to climate action and broader social issues…(More)”.

Meta Kills a Crucial Transparency Tool At the Worst Possible Time


Interview by Vittoria Elliott: “Earlier this month, Meta announced that it would be shutting down CrowdTangle, the social media monitoring and transparency tool that has allowed journalists and researchers to track the spread of mis- and disinformation. It will cease to function on August 14, 2024—just months before the US presidential election.

Meta’s move is just the latest example of a tech company rolling back transparency and security measures as the world enters the biggest global election year in history. The company says it is replacing CrowdTangle with a new Content Library API, which will require researchers and nonprofits to apply for access to the company’s data. But the Mozilla Foundation and 140 other civil society organizations protested last week that the new offering lacks much of CrowdTangle’s functionality, asking the company to keep the original tool operating until January 2025.

Meta spokesperson Andy Stone countered in posts on X that the groups’ claims “are just wrong,” saying the new Content Library will contain “more comprehensive data than CrowdTangle” and be made available to nonprofits, academics, and election integrity experts. When asked why commercial newsrooms, like WIRED, are to be excluded from the Content Library, Meta spokesperson Eric Porterfield said,  that it was “built for research purposes.” While journalists might not have direct access he suggested they could use commercial social network analysis tools, or “partner with an academic institution to help answer a research question related to our platforms.”

Brandon Silverman, cofounder and former CEO of CrowdTangle, who continued to work on the tool after Facebook acquired it in 2016, says it’s time to force platforms to open up their data to outsiders. The conversation has been edited for length and clarity…(More)”.

Commons-based Data Set: Governance for AI


Report by Open Future: “In this white paper, we propose an approach to sharing data sets for AI training as a public good governed as a commons. By adhering to the six principles of commons-based governance, data sets can be managed in a way that generates public value while making shared resources resilient to extraction or capture by commercial interests.

The purpose of defining these principles is two-fold:

We propose these principles as input into policy debates on data and AI governance. A commons-based approach can be introduced through regulatory means, funding and procurement rules, statements of principles, or data sharing frameworks. Secondly, these principles can also serve as a blueprint for the design of data sets that are governed and shared as a commons. To this end, we also provide practical examples of how these principles are being brought to life. Projects like Big Science or Common Voice have demonstrated that commons-based data sets can be successfully built.

These principles, tailored for the governance of AI data sets, are built on our previous work on Data Commons Primer. They are also the outcome of our research into the governance of AI datasets, including the AI_Commons case study.  Finally, they are based on ongoing efforts to define how AI systems can be shared and made open, in which we have been participating – including the OSI-led process to define open-source AI systems, and the DPGA Community of Practice exploring AI systems as Digital Public Goods…(More)”.

The six principles for commons-based data set governance are as follows:

Unconventional data, unprecedented insights: leveraging non-traditional data during a pandemic


Paper by Kaylin Bolt et al: “The COVID-19 pandemic prompted new interest in non-traditional data sources to inform response efforts and mitigate knowledge gaps. While non-traditional data offers some advantages over traditional data, it also raises concerns related to biases, representativity, informed consent and security vulnerabilities. This study focuses on three specific types of non-traditional data: mobility, social media, and participatory surveillance platform data. Qualitative results are presented on the successes, challenges, and recommendations of key informants who used these non-traditional data sources during the COVID-19 pandemic in Spain and Italy….

Non-traditional data proved valuable in providing rapid results and filling data gaps, especially when traditional data faced delays. Increased data access and innovative collaborative efforts across sectors facilitated its use. Challenges included unreliable access and data quality concerns, particularly the lack of comprehensive demographic and geographic information. To further leverage non-traditional data, participants recommended prioritizing data governance, establishing data brokers, and sustaining multi-institutional collaborations. The value of non-traditional data was perceived as underutilized in public health surveillance, program evaluation and policymaking. Participants saw opportunities to integrate them into public health systems with the necessary investments in data pipelines, infrastructure, and technical capacity…(More)”.

Governing the use of big data and digital twin technology for sustainable tourism


Report by Eko Rahmadian: “The tourism industry is increasingly utilizing big data to gain valuable insights and enhance decision-making processes. The advantages of big data, such as real-time information, robust data processing capabilities, and improved stakeholder decision-making, make it a promising tool for analyzing various aspects of tourism, including sustainability. Moreover, integrating big data with prominent technologies like machine learning, artificial intelligence (AI), and the Internet of Things (IoT) has the potential to revolutionize smart and sustainable tourism.

Despite the potential benefits, the use of big data for sustainable tourism remains limited, and its implementation poses challenges related to governance, data privacy, ethics, stakeholder communication, and regulatory compliance. Addressing these challenges is crucial to ensure the responsible and sustainable use of these technologies. Therefore, strategies must be developed to navigate these issues through a proper governing system.

To bridge the existing gap, this dissertation focuses on the current research on big data for sustainable tourism and strategies for governing its use and implementation in conjunction with emerging technologies. Specifically, this PhD dissertation centers on mobile positioning data (MPD) as a case due to its unique benefits, challenges, and complexity. Also, this project introduces three frameworks, namely: 1) a conceptual framework for digital twins (DT) for smart and sustainable tourism, 2) a documentation framework for architectural decisions (DFAD) to ensure the successful implementation of the DT technology as a governance mechanism, and 3) a big data governance framework for official statistics (BDGF). This dissertation not only presents these frameworks and their benefits but also investigates the issues and challenges related to big data governance while empirically validating the applicability of the proposed frameworks…(More)”.

Community views on the secondary use of general practice data: Findings from a mixed-methods study


Paper by Annette J. Braunack-Mayer et al: “General practice data, particularly when combined with hospital and other health service data through data linkage, are increasingly being used for quality assurance, evaluation, health service planning and research.Using general practice data is particularly important in countries where general practitioners (GPs) are the first and principal source of health care for most people.

Although there is broad public support for the secondary use of health data, there are good reasons to question whether this support extends to general practice settings. GP–patient relationships may be very personal and longstanding and the general practice health record can capture a large amount of information about patients. There is also the potential for multiple angles on patients’ lives: GPs often care for, or at least record information about, more than one generation of a family. These factors combine to amplify patients’ and GPs’ concerns about sharing patient data….

Adams et al. have developed a model of social licence, specifically in the context of sharing administrative data for health research, based on an analysis of the social licence literature and founded on two principal elements: trust and legitimacy.In this model, trust is founded on research enterprises being perceived as reliable and responsive, including in relation to privacy and security of information, and having regard to the community’s interests and well-being.

Transparency and accountability measures may be used to demonstrate trustworthiness and, as a consequence, to generate trust. Transparency involves a level of openness about the way data are handled and used as well as about the nature and outcomes of the research. Adams et al. note that lack of transparency can undermine trust. They also note that the quality of public engagement is important and that simply providing information is not sufficient. While this is one element of transparency, other elements such as accountability and collaboration are also part of the trusting, reflexive relationship necessary to establish and support social licence.

The second principal element, legitimacy, is founded on research enterprises conforming to the legal, cultural and social norms of society and, again, acting in the best interests of the community. In diverse communities with a range of views and interests, it is necessary to develop a broad consensus on what amounts to the common good through deliberative and collaborative processes.

Social licence cannot be assumed. It must be built through public discussion and engagement to avoid undermining the relationship of trust with health care providers and confidence in the confidentiality of health information…(More)”

Outpacing Pandemics: Solving the First and Last Mile Challenges of Data-Driven Policy Making


Article by Stefaan Verhulst, Daniela Paolotti, Ciro Cattuto, and Alessandro Vespignani: “As society continues to emerge from the legacy of COVID-19, a dangerous complacency seems to be setting in. Amidst recurrent surges of cases, each serving as a reminder of the virus’s persistence, there is a noticeable decline in collective urgency to prepare for future pandemics. This situation represents not just a lapse in memory but a significant shortfall in our approach to pandemic preparedness. It dramatically underscores the urgent need to develop novel and sustainable approaches and responses and to reinvent how we approach public health emergencies.

Among the many lessons learned from previous infectious disease outbreaks, the potential and utility of data, and particularly non-traditional forms of data, are surely among the most important lessons. Among other benefits, data has proven useful in providing intelligence and situational awareness in early stages of outbreaks, empowering citizens to protect their health and the health of vulnerable community members, advancing compliance with non-pharmaceutical interventions to mitigate societal impacts, tracking vaccination rates and the availability of treatment, and more. A variety of research now highlights the particular role played by open source data (and other non-traditional forms of data) in these initiatives.

Although multiple data sources are useful at various stages of outbreaks, we focus on two critical stages proven to be especially challenging: what we call the first mile and the last mile.

We argue that focusing on these two stages (or chokepoints) can help pandemic responses and rationalize resources. In particular, we highlight the role of Data Stewards at both stages and in overall pandemic response effectiveness…(More)”.