Vetted Researcher Data Access


Coimisiún na Meán: “Article 40 of the Digital Services Act (DSA) makes provision for researchers to access data from Very Large Online Platforms (VLOPs) or Very Large Online Search Engines (VLOSEs) for the purposes of studying systemic risk in the EU and assessing mitigation measures. There are two ways that researchers that are studying systemic risk in the EU can get access to data under Article 40 of the DSA. 

Non-public data, known as “vetted researcher data access”, under Article 40(4)-(11). This is a process where a researcher, who has been vetted or assessed by a Digital Services Coordinator to have met the criteria as set out in DSA Article 40(8), can request access to non-public data held by a VLOP/VLOSE. The data must be limited in scope and deemed necessary and proportionate to the purpose of the research.

Public data under Article 40(12).  This is a process where a researcher who meets the relevant criteria can apply for data access directly from a VLOP/VLOSE, for example, access to a content library or API of public posts…(More)”.

A US-run system alerts the world to famines. It’s gone dark after Trump slashed foreign aid


Article by Lauren Kent: “A vital, US-run monitoring system focused on spotting food crises before they turn into famines has gone dark after the Trump administration slashed foreign aid.

The Famine Early Warning Systems Network (FEWS NET) monitors drought, crop production, food prices and other indicators in order to forecast food insecurity in more than 30 countries…Now, its work to prevent hunger in Sudan, South Sudan, Somalia, Yemen, Ethiopia, Afghanistan and many other nations has been stopped amid the Trump administration’s effort to dismantle the US Agency for International Development (USAID).

“These are the most acutely food insecure countries around the globe,” said Tanya Boudreau, the former manager of the project.

Amid the aid freeze, FEWS NET has no funding to pay staff in Washington or those working on the ground. The website is down. And its treasure trove of data that underpinned global analysis on food security – used by researchers around the world – has been pulled offline.

FEWS NET is considered the gold-standard in the sector, and it publishes more frequent updates than other global monitoring efforts. Those frequent reports and projections are key, experts say, because food crises evolve over time, meaning early interventions save lives and save money…The team at the University of Colorado Boulder has built a model to forecast water demand in Kenya, which feeds some data into the FEWS NET project but also relies on FEWS NET data provided by other research teams.

The data is layered and complex. And scientists say pulling the data hosted by the US disrupts other research and famine-prevention work conducted by universities and governments across the globe.

“It compromises our models, and our ability to be able to provide accurate forecasts of ground water use,” Denis Muthike, a Kenyan scientist and assistant research professor at UC Boulder, told CNN, adding: “You cannot talk about food security without water security as well.”

“Imagine that that data is available to regions like Africa and has been utilized for years and years – decades – to help inform divisions that mitigate catastrophic impacts from weather and climate events, and you’re taking that away from the region,” Muthike said. He cautioned that it would take many years to build another monitoring service that could reach the same level…(More)”.

Funding the Future: Grantmakers Strategies in AI Investment


Report by Project Evident: “…looks at how philanthropic funders are approaching requests to fund the use of AI… there was common recognition of AI’s importance and the tension between the need to learn more and to act quickly to meet the pace of innovation, adoption, and use of AI tools.

This research builds on the work of a February 2024 Project Evident and Stanford Institute for Human-Centered Artificial Intelligence working paper, Inspiring Action: Identifying the Social Sector AI Opportunity Gap. That paper reported that more practitioners than funders (by over a third) claimed their organization utilized AI. 

“From our earlier research, as well as in conversations with funders and nonprofits, it’s clear there’s a mismatch in the understanding and desire for AI tools and the funding of AI tools,” said Sarah Di Troia, Managing Director of Project Evident’s OutcomesAI practice and author of the report. “Grantmakers have an opportunity to quickly upskill their understanding – to help nonprofits improve their efficiency and impact, of course, but especially to shape the role of AI in civil society.”

The report offers a number of recommendations to the philanthropic sector. For example, funders and practitioners should ensure that community voice is included in the implementation of new AI initiatives to build trust and help reduce bias. Grantmakers should consider funding that allows for flexibility and innovation so that the social and education sectors can experiment with approaches. Most importantly, funders should increase their capacity and confidence in assessing AI implementation requests along both technical and ethical criteria…(More)”.

Human Development and the Data Revolution


Book edited by Sanna Ojanperä, Eduardo López, and Mark Graham: “…explores the uses of large-scale data in the contexts of development, in particular, what techniques, data sources, and possibilities exist for harnessing large datasets and new online data to address persistent concerns regarding human development, inequality, exclusion, and participation.

Employing a global perspective to explore the latest advances at the intersection of big data analysis and human development, this volume brings together pioneering voices from academia, development practice, civil society organizations, government, and the private sector. With a two-pronged focus on theoretical and practical research on big data and computational approaches in human development, the volume covers such themes as data acquisition, data management, data mining and statistical analysis, network science, visual analytics, and geographic information systems and discusses them in terms of practical applications in development projects and initiatives. Ethical considerations surrounding these topics are visited throughout, highlighting the tradeoffs between benefitting and harming those who are the subjects of these new approaches…(More)”

Standards


Book by Jeffrey Pomerantz and Jason Griffey: “Standards are the DNA of the built environment, encoded in nearly all objects that surround us in the modern world. In Standards, Jeffrey Pomerantz and Jason Griffey provide an essential introduction to this invisible but critical form of infrastructure—the rules and specifications that govern so many elements of the physical and digital environments, from the color of school buses to the shape of shipping containers.

In an approachable, often outright funny fashion, Pomerantz and Griffey explore the nature, function, and effect of standards in everyday life. Using examples of specific standards and contexts in which they are applied—in the realms of technology, economics, sociology, and information science—they illustrate how standards influence the development and scope, and indeed the very range of possibilities of our built and social worlds. Deeply informed and informally written, their work makes a subject generally deemed boring, complex, and fundamentally important comprehensible, clear, and downright engaging…(More)”.

Artificial intelligence for digital citizen participation: Design principles for a collective intelligence architecture


Paper by Nicolas Bono Rossello, Anthony Simonofski, and Annick Castiaux: “The challenges posed by digital citizen participation and the amount of data generated by Digital Participation Platforms (DPPs) create an ideal context for the implementation of Artificial Intelligence (AI) solutions. However, current AI solutions in DPPs focus mainly on technical challenges, often neglecting their social impact and not fully exploiting AI’s potential to empower citizens. The goal of this paper is thus to investigate how to design digital participation platforms that integrate technical AI solutions while considering the social context in which they are implemented. Using Collective Intelligence as kernel theory, and through a literature review and a focus group, we generate design principles for the development of a socio-technically aware AI architecture. These principles are then validated by experts from the field of AI and citizen participation. The principles suggest optimizing the alignment of AI solutions with project goals, ensuring their structured integration across multiple levels, enhancing transparency, monitoring AI-driven impacts, dynamically allocating AI actions, empowering users, and balancing cognitive disparities. These principles provide a theoretical basis for future AI-driven artifacts, and theories in digital citizen participation…(More)”.

Extending the CARE Principles: managing data for vulnerable communities in wartime and humanitarian crises


Essay by Yana Suchikova & Serhii Nazarovets: “The CARE Principles (Collective Benefit, Authority to Control, Responsibility, Ethics) were developed to ensure ethical stewardship of Indigenous data. However, their adaptability makes them an ideal framework for managing data related to vulnerable populations affected by armed conflicts. This essay explores the application of CARE principles to wartime contexts, with a particular focus on internally displaced persons (IDPs) and civilians living under occupation. These groups face significant risks of data misuse, ranging from privacy violations to targeted repression. By adapting CARE, data governance can prioritize safety, dignity, and empowerment while ensuring that data serves the collective welfare of affected communities. Drawing on examples from Indigenous data governance, open science initiatives, and wartime humanitarian challenges, this essay argues for extending CARE principles beyond their original scope. Such an adaptation highlights CARE’s potential as a universal standard for addressing the ethical complexities of data management in humanitarian crises and conflict-affected environments…(More)”.

Research Handbook on Open Government


Handbook edited by Edited by Mila Gascó-Hernandez, Aryamala Prasad , J. Ramon Gil-Garcia , and Theresa A. Pardo: “In the past decade, open government has received renewed attention. It has increasingly been acknowledged globally as necessary to enhance democratic governance by building on the pillars of transparency, participation, and collaboration (Gil-Garcia et al., 2020). Transnational multistakeholder initiatives, such as the Open Government Partnership, have fostered the development of open government by raising awareness about the concept and encouraging reforms in member countries. In this respect, many countries at the local, state, and federal levels have implemented open government initiatives in different policy domains and government functions, such as procurement, policing, education, and public budgeting. More recently, the emergence of digital technologies to facilitate innovative and collaborative approaches to open government is setting these new efforts apart from previous ones, designed to strengthen information access and transparency. Building a new shared understanding of open government, how various contexts shape the perceptions of open government by different stakeholders, and the ways in which digital technologies can advance open government is important for both research and practice…

the Handbook is structured into five sections, each dedicated to highlighting important facets of open government. Part I delves into the historical evolution of open government, setting the stage for the rest of the Handbook. In Part II, the Handbook presents research on the core components of open government, offering invaluable insights on transparency, participation, and collaboration. Part III focuses on the application of open government across diverse policy domains. Shifting focus, Part IV discusses open government implementation within different geographical and national contexts. Finally, Part V introduces emerging trends in open government research. As a whole, the Handbook offers a comprehensive view of open government, from its origins to its contemporary progress and future trends…(More)”.

Data, waves and wind to be counted in the economy


Article by Robert Cuffe: “Wind and waves are set to be included in calculations of the size of countries’ economies for the first time, as part of changes approved at the United Nations.

Assets like oilfields were already factored in under the rules – last updated in 2008.

This update aims to capture areas that have grown since then, such as the cost of using up natural resources and the value of data.

The changes come into force in 2030, and could mean an increase in estimates of the size of the UK economy making promises to spend a fixed share of the economy on defence or aid more expensive.

The economic value of wind and waves can be estimated from the price of all the energy that can be generated from the turbines in a country.

The update also treats data as an asset in its own right on top of the assets that house it like servers and cables.

Governments use a common rule book for measuring the size of their economies and how they grow over time.

These changes to the rule book are “tweaks, rather than a rewrite”, according to Prof Diane Coyle of the University of Cambridge.

Ben Zaranko of the Institute for Fiscal Studies (IFS) calls it an “accounting” change, rather than a real change. He explains: “We’d be no better off in a material sense, and tax revenues would be no higher.”

But it could make economies look bigger, creating a possible future spending headache for the UK government…(More)”.

Bridging the Data Provenance Gap Across Text, Speech and Video


Paper by Shayne Longpre et al: “Progress in AI is driven largely by the scale and quality of training data. Despite this, there is a deficit of empirical analysis examining the attributes of well-established datasets beyond text. In this work we conduct the largest and first-of-its-kind longitudinal audit across modalities–popular text, speech, and video datasets–from their detailed sourcing trends and use restrictions to their geographical and linguistic representation. Our manual analysis covers nearly 4000 public datasets between 1990-2024, spanning 608 languages, 798 sources, 659 organizations, and 67 countries. We find that multimodal machine learning applications have overwhelmingly turned to web-crawled, synthetic, and social media platforms, such as YouTube, for their training sets, eclipsing all other sources since 2019. Secondly, tracing the chain of dataset derivations we find that while less than 33% of datasets are restrictively licensed, over 80% of the source content in widely-used text, speech, and video datasets, carry non-commercial restrictions. Finally, counter to the rising number of languages and geographies represented in public AI training datasets, our audit demonstrates measures of relative geographical and multilingual representation have failed to significantly improve their coverage since 2013. We believe the breadth of our audit enables us to empirically examine trends in data sourcing, restrictions, and Western-centricity at an ecosystem-level, and that visibility into these questions are essential to progress in responsible AI. As a contribution to ongoing improvements in dataset transparency and responsible use, we release our entire multimodal audit, allowing practitioners to trace data provenance across text, speech, and video…(More)”.