data collaboratives

From Insights to Action: Amplifying Positive Deviance within Somali Rangelands

Curated on March 23, 2025March 23, 2025 by Stefaan Verhulst

Article by Basma Albanna, Andreas Pawelke and Hodan Abdullahi: “In every community, some individuals or groups achieve significantly better outcomes than their peers, despite having similar challenges and resources. Finding these so-called positive deviants and working with them to diffuse their practices is referred to as the Positive Deviance approach. The Data-Powered Positive Deviance (DPPD) method follows the same logic as the Positive Deviance approach but leverages existing, non-traditional data sources, in conjunction with traditional data sources to identify and scale the solutions of positive deviants. The UNDP Somalia Accelerator Lab was part of the first cohort of teams that piloted the application of DPPD trying to tackle the rangeland health problem in the West Golis region. In this blog post we’re reflecting on the process we designed and tested to go from the identification and validation of successful practices to helping other communities adopt them.

Uncovering Rangeland Success

Three years ago we embarked on a journey to identify pastoral communities in Somaliland that demonstrated resilience in the face of adversity. Using a mix of traditional and non-traditional data sources, we wanted to explore and learn from communities that managed to have healthy rangelands despite the severe droughts of 2016 and 2017.

We engaged with government officials from various ministries, experts from the University of Hargeisa, international organizations like the FAO and members of agro-pastoral communities to learn more about rangeland health. We then selected the West Golis as our region of interest with a majority pastoral community and relative ease of access. Employing the Soil-Adjusted Vegetation Index (SAVI) and using geospatial and earth observation data allowed us to identify an initial group of potential positive deviants illustrated as green circles in Figure 1 below.

From Insights to Action: Amplifying Positive Deviance within Somali Rangelands — Figure 1: Measuring the vegetation health within 5 km community buffer zones based on SAVI.

Following the identification of potential positive deviants, we engaged with 18 pastoral communities from the Togdheer, Awdal, and Maroodijeex regions to validate whether the positive deviants we found using earth observation data were indeed doing better than the other communities.

The primary objective of the fieldwork was to uncover the existing practices and strategies that could explain the outperformance of positively-deviant communities compared to other communities. The research team identified a range of strategies, including soil and water conservation techniques, locally-produced pesticides, and reseeding practices as summarized in Figure 2.

From Insights to Action — Figure 2: Strategies and practices that emerged from the fieldwork

Data-Powered Positive Deviance is not just about identifying outperformers and their successful practices. The real value lies in the diffusion, adoption and adaptation of these practices by individuals, groups or communities facing similar challenges. For this to succeed, both the positive deviants and those learning about their practices must take ownership and drive the process. Merely presenting the uncommon but successful practices of positive deviants to others will not work. The secret to success is in empowering the community to take charge, overcome challenges, and leverage their own resources and capabilities to effect change…(More)”.

Integrating Social Media into Biodiversity Databases: The Next Big Step?

Curated on March 23, 2025March 23, 2025 by Stefaan Verhulst

Article by Muhammad Osama: “Digital technologies and social media have transformed ecology and conservation biology data collection. Traditional biodiversity monitoring often relies on field surveys, which can be time-consuming and biased toward rural habitats.

The Global Biodiversity Information Facility (GBIF) serves as a key repository for biodiversity data, but it faces challenges such as delayed data availability and underrepresentation of urban habitats.

Social media platforms have become valuable tools for rapid data collection, enabling users to share georeferenced observations instantly, reducing time lags associated with traditional methods. The widespread use of smartphones with cameras allows individuals to document wildlife sightings in real-time, enhancing biodiversity monitoring. Integrating social media data with traditional ecological datasets offers significant advancements, particularly in tracking species distributions in urban areas.

In this paper, the authors evaluated the Jersey tiger moth’s habitat usage by comparing occurrence data from social media platforms (Instagram and Flickr) with traditional records from GBIF and iNaturalist. They hypothesized that social media data would reveal significant JTM occurrences in urban environments, which may be underrepresented in traditional datasets…(More)”.

Bridging Digital Divides: How PescaData is Connecting Small-Scale Fishing Cooperatives to the Blue Economy

Curated on March 20, 2025 by Stefaan Verhulst

Article by Stuart Fulton: “In this research project, we examine how digital platforms – specifically PescaData – can be leveraged to connect small-scale fishing cooperatives with impact investors and donors, creating new pathways for sustainable blue economy financing, while simultaneously ensuring fair data practices that respect data sovereignty and traditional ecological knowledge.

PescaData emerged as a pioneering digital platform that enables fishing communities to collect more accurate data to ensure sustainable fisheries. Since then, PescaData has evolved to provide software as a service to fishing cooperatives and to allow fishers to document their solutions to environmental and economic challenges. Since 2022, small-scale fishers have used it to document nearly 300 initiatives that contribute to multiple Sustainable Development Goals.

Respecting Data Sovereignty in the Digital Age

One critical aspect of our research acknowledges the unique challenges of implementing digital tools in traditional cooperative settings. Unlike conventional tech implementations that often extract value from communities, PescaData´s approach centers on data sovereignty – the principle that fishing communities should maintain ownership and control over their data. As the PescaData case study demonstrates, a humanity-centric rather than merely user-centric approach is essential. This means designing with compassion and establishing clear governance around data from the very beginning. The data generated by fishing cooperatives represents not just information, but traditional knowledge accumulated over generations of resource management.

The fishers themselves have articulated clear principles for data governance in a cooperative model:

Ownership: Fishers, as data producers, decide who has access and under what conditions.
Transparency: Clear agreements on data use.
Knowledge assessment: Highlighting fishers’ contributions and placing them in decision-making positions.
Co-design: Ensuring the platform meets their specific needs.
Security: Protecting collected data…(More)”.

Can Real-Time Metrics Fill China’s Data Gap?

Curated on March 16, 2025March 20, 2025 by Stefaan Verhulst

Case-study by Danielle Goldfarb: “After Chinese authorities abruptly reversed the country’s zero-COVID policy in 2022, global policymakers needed a clear and timely picture of the economic and health fallout.

China’s economy is the world’s second largest and the country has deep global links, so an accurate picture of its trajectory mattered for global health, growth and inflation. Getting a solid read was a challenge, however, since official health and economic data not only were not timely, but were widely viewed as unreliable.

There are now vast amounts and varied types of digital data available, from satellite images to social media text to online payments; these, along with advances in artificial intelligence (AI), make it possible to collect and analyze digital data in ways previously impossible.

Could these new tools help governments and global institutions refute or confirm China’s official picture and gather more timely intelligence?..(More)”.

Vetted Researcher Data Access

Curated on March 10, 2025March 10, 2025 by Stefaan Verhulst

Coimisiún na Meán: “Article 40 of the Digital Services Act (DSA) makes provision for researchers to access data from Very Large Online Platforms (VLOPs) or Very Large Online Search Engines (VLOSEs) for the purposes of studying systemic risk in the EU and assessing mitigation measures. There are two ways that researchers that are studying systemic risk in the EU can get access to data under Article 40 of the DSA.

Non-public data, known as “vetted researcher data access”, under Article 40(4)-(11). This is a process where a researcher, who has been vetted or assessed by a Digital Services Coordinator to have met the criteria as set out in DSA Article 40(8), can request access to non-public data held by a VLOP/VLOSE. The data must be limited in scope and deemed necessary and proportionate to the purpose of the research.

Public data under Article 40(12). This is a process where a researcher who meets the relevant criteria can apply for data access directly from a VLOP/VLOSE, for example, access to a content library or API of public posts…(More)”.

A Roadmap to Accessing Mobile Network Data for Statistics

Curated on March 7, 2025March 7, 2025 by Stefaan Verhulst

Guide by Global Partnership for Sustainable Development Data: “… introduces milestones on the path to mobile network data access. While it is aimed at stakeholders in national statistical systems and across national governments in general, the lessons should resonate with others seeking to take this route. The steps in this guide are written in the order in which they should be taken, and some readers who have already embarked on this journey may find they have completed some of these steps.

This roadmap is meant to be followed in steps, and readers may start, stop, and return to points on the path at any point.

The path to mobile network data access has three milestones:

Evaluating the opportunity – setting clear goals for the desired impact of data innovation.
Engaging with stakeholders – getting critical stakeholders to support your cause.
Executing collaboration agreements – signing a written agreement among partners…(More)”

What 40 Million Devices Can Teach Us About Digital Literacy in America

Curated on February 19, 2025February 19, 2025 by Stefaan Verhulst

Blog by Juan M. Lavista Ferres: “…For the first time, Microsoft is releasing a privacy-protected dataset that provides new insights into digital engagement across the United States. This dataset, built from anonymized usage data from 40 million Windows devices, offers the most comprehensive view ever assembled of how digital tools are being used across the country. It goes beyond surveys and self-reported data to provide a real-world look at software application usage across 28,000 ZIP codes, creating a more detailed and nuanced understanding of digital engagement than any existing commercial or government study.

In collaboration with leading researchers at Harvard University and the University of Pennsylvania, we analyzed this dataset and developed two key indices to measure digital literacy:

Media & Information Composite Index (MCI): This index captures general computing activity, including media consumption, information gathering, and usage of productivity applications like word processing, spreadsheets, and presentations.
Content Creation & Computation Index (CCI): This index measures engagement with more specialized digital applications, such as content creation tools like Photoshop and software development environments.

By combining these indices with demographic data, several important insights emerge:

Urban-Rural Disparities Exist—But the Gaps Are Uneven While rural areas often lag in digital engagement, disparities within urban areas are just as pronounced. Some city neighborhoods have digital activity levels on par with major tech hubs, while others fall significantly behind, revealing a more complex digital divide than previously understood.

Income and Education Are Key Drivers of Digital Engagement Higher-income and higher-education areas show significantly greater engagement in content creation and computational tasks. This suggests that digital skills—not just access—are critical in shaping economic mobility and opportunity. Even in places where broadband availability is the same, digital usage patterns vary widely, demonstrating that access alone is not enough.

Infrastructure Alone Won’t Close the Digital Divide Providing broadband connectivity is essential, but it is not a sufficient solution to the challenges of digital literacy. Our findings show that even in well-connected regions, significant skill gaps persist. This means that policies and interventions must go beyond infrastructure investments to include comprehensive digital education, skills training, and workforce development initiatives…(More)”.

AI crawler wars threaten to make the web more closed for everyone

Curated on February 13, 2025February 13, 2025 by Stefaan Verhulst

Article by Shayne Longpre: “We often take the internet for granted. It’s an ocean of information at our fingertips—and it simply works. But this system relies on swarms of “crawlers”—bots that roam the web, visit millions of websites every day, and report what they see. This is how Google powers its search engines, how Amazon sets competitive prices, and how Kayak aggregates travel listings. Beyond the world of commerce, crawlers are essential for monitoring web security, enabling accessibility tools, and preserving historical archives. Academics, journalists, and civil societies also rely on them to conduct crucial investigative research.

Crawlers are endemic. Now representing half of all internet traffic, they will soon outpace human traffic. This unseen subway of the web ferries information from site to site, day and night. And as of late, they serve one more purpose: Companies such as OpenAI use web-crawled data to train their artificial intelligence systems, like ChatGPT.

Understandably, websites are now fighting back for fear that this invasive species—AI crawlers—will help displace them. But there’s a problem: This pushback is also threatening the transparency and open borders of the web, that allow non-AI applications to flourish. Unless we are thoughtful about how we fix this, the web will increasingly be fortified with logins, paywalls, and access tolls that inhibit not just AI but the biodiversity of real users and useful crawlers…(More)”.

Recommendations for Better Sharing of Climate Data

Curated on February 11, 2025February 13, 2025 by Stefaan Verhulst

Creative Commons: “…the culmination of a nine-month research initiative from our Open Climate Data project. These guidelines are a result of collaboration between Creative Commons, government agencies and intergovernmental organizations. They mark a significant milestone in our ongoing effort to enhance the accessibility, sharing, and reuse of open climate data to address the climate crisis. Our goal is to share strategies that align with existing data sharing principles and pave the way for a more interconnected and accessible future for climate data.

Our recommendations offer practical steps and best practices, crafted in collaboration with key stakeholders and organizations dedicated to advancing open practices in climate data. We provide recommendations for 1) legal and licensing terms, 2) using metadata values for attribution and provenance, and 3) management and governance for better sharing.

Opening climate data requires an examination of the public’s legal rights to access and use the climate data, often dictated by copyright and licensing. This legal detail is sometimes missing from climate data sharing and legal interoperability conversations. Our recommendations suggest two options: Option A: CC0 + Attribution Request, in order to maximize reuse by dedicating climate data to the public domain, plus a request for attribution; and Option B: CC BY 4.0, for retaining data ownership and legal enforcement of attribution. We address how to navigate license stacking and attribution stacking for climate data hosts and for users working with multiple climate data sources.

We also propose standardized human- and machine-readable metadata values that enhance transparency, reduce guesswork, and ensure broader accessibility to climate data. We built upon existing model metadata schemas and standards, including those that address license and attribution information. These recommendations address a gap and provide metadata schema that standardize the inclusion of upfront, clear values related to attribution, licensing and provenance.

Lastly, we highlight four key aspects of effective climate data management: designating a dedicated technical managing steward, designating a legal and/or policy steward, encouraging collaborative data sharing, and regularly revisiting and updating data sharing policies in accordance with parallel open data policies and standards…(More)”.

Cities, health, and the big data revolution

Curated on February 5, 2025February 6, 2025 by Stefaan Verhulst

Blog by Harvard Public Health: “Cities influence our health in unexpected ways. From sidewalks to crosswalks, the built environment affects how much we move, impacting our risk for diseases like obesity and diabetes. A recent New York City study underscores that focusing solely on infrastructure, without understanding how people use it, can lead to ineffective interventions. Researchers analyzed over two million Google Street View images, combining them with health and demographic data to reveal these dynamics. Harvard Public Health spoke with Rumi Chunara, director of New York University’s Center for Health Data Science and lead author of the study.

Why study this topic?

We’re seeing an explosion of new data sources, like street-view imagery, being used to make decisions. But there’s often a disconnect—people using these tools don’t always have the public health knowledge to interpret the data correctly. We wanted to highlight the importance of combining data science and domain expertise to ensure interventions are accurate and impactful.

What did you find?

We discovered that the relationship between built environment features and health outcomes isn’t straightforward. It’s not just about having sidewalks; it’s about how often people are using them. Improving physical activity levels in a community could have a far greater impact on health outcomes than simply adding more infrastructure.

It also revealed the importance of understanding the local context. For instance, Google Street View data sometimes misclassifies sidewalks, particularly near highways or bridges, leading to inaccurate conclusions. Relying solely on this data, without accounting for these nuances, could result in less effective interventions…(More)”.