OECD Report: “Governments worldwide are transforming public services through innovative approaches that place people at the center of design and delivery. This report analyses nearly 800 case studies from 83 countries and identifies five critical trends in government innovation that are reshaping public services. First, governments are working with users and stakeholders to co-design solutions and anticipate future needs to create flexible, responsive, resilient and sustainable public services. Second, governments are investing in scalable digital infrastructure, experimenting with emergent technologies (such as automation, AI and modular code), and expanding innovative and digital skills to make public services more efficient. Third, governments are making public services more personalised and proactive to better meet people’s needs and expectations and reduce psychological costs and administrative frictions, ensuring they are more accessible, inclusive and empowering, especially for persons and groups in vulnerable and disadvantaged circumstances. Fourth, governments are drawing on traditional and non-traditional data sources to guide public service design and execution. They are also increasingly using experimentation to navigate highly complex and unpredictable environments. Finally, governments are reframing public services as opportunities and channels for citizens to exercise their civic engagement and hold governments accountable for upholding democratic values such as openness and inclusion…(More)”.
Direct democracy in the digital age: opportunities, challenges, and new approaches
Article by Pattharapong Rattanasevee, Yared Akarapattananukul & Yodsapon Chirawut: “This article delves into the evolving landscape of direct democracy, particularly in the context of the digital era, where ICT and digital platforms play a pivotal role in shaping democratic engagement. Through a comprehensive analysis of empirical data and theoretical frameworks, it evaluates the advantages and inherent challenges of direct democracy, such as majority tyranny, short-term focus, polarization, and the spread of misinformation. It proposes the concept of Liquid democracy as a promising hybrid model that combines direct and representative elements, allowing for voting rights delegation to trusted entities, thereby potentially mitigating some of the traditional drawbacks of direct democracy. Furthermore, the article underscores the necessity for legal regulations and constitutional safeguards to protect fundamental rights and ensure long-term sustainability within a direct democracy framework. This research contributes to the ongoing discourse on democratic innovation and highlights the need for a balanced approach to integrating digital tools with democratic processes…(More)”.
Announcing SPARROW: A Breakthrough AI Tool to Measure and Protect Earth’s Biodiversity in the Most Remote Places
Blog by Juan Lavista Ferres: “The biodiversity of our planet is rapidly declining. We’ve likely reached a tipping point where it is crucial to use every tool at our disposal to help preserve what remains. That’s why I am pleased to announce SPARROW—Solar-Powered Acoustic and Remote Recording Observation Watch, developed by Microsoft’s AI for Good Lab. SPARROW is an AI-powered edge computing solution designed to operate autonomously in the most remote corners of the planet. Solar-powered and equipped with advanced sensors, it collects biodiversity data—from camera traps, acoustic monitors, and other environmental detectors—that are processed using our most advanced PyTorch-based wildlife AI models on low-energy edge GPUs. The resulting critical information is then transmitted via low-Earth orbit satellites directly to the cloud, allowing researchers to access fresh, actionable insights in real time, no matter where they are.
Think of SPARROW as a network of Earth-bound satellites, quietly observing and reporting on the health of our ecosystems without disrupting them. By leveraging solar energy, these devices can run for a long time, minimizing their footprint and any potential harm to the environment…(More)”.
A linkless internet
Essay by Collin Jennings: “..But now Google and other websites are moving away from relying on links in favour of artificial intelligence chatbots. Considered as preserved trails of connected ideas, links make sense as early victims of the AI revolution since large language models (LLMs) such as ChatGPT, Google’s Gemini and others abstract the information represented online and present it in source-less summaries. We are at a moment in the history of the web in which the link itself – the countless connections made by website creators, the endless tapestry of ideas woven together throughout the web – is in danger of going extinct. So it’s pertinent to ask: how did links come to represent information in the first place? And what’s at stake in the movement away from links toward AI chat interfaces?
To answer these questions, we need to go back to the 17th century, when writers and philosophers developed the theory of mind that ultimately inspired early hypertext plans. In this era, prominent philosophers, including Thomas Hobbes and John Locke, debated the extent to which a person controls the succession of ideas that appears in her mind. They posited that the succession of ideas reflects the interaction between the data received from the senses and one’s mental faculties – reason and imagination. Subsequently, David Hume argued that all successive ideas are linked by association. He enumerated three kinds of associative connections among ideas: resemblance, contiguity, and cause and effect. In An Enquiry Concerning Human Understanding (1748), Hume offers examples of each relationship:
A picture naturally leads our thoughts to the original: the mention of one apartment in a building naturally introduces an enquiry or discourse concerning the others: and if we think of a wound, we can scarcely forbear reflecting on the pain which follows it.
The mind follows connections found in the world. Locke and Hume believed that all human knowledge comes from experience, and so they had to explain how the mind receives, processes and stores external data. They often reached for media metaphors to describe the relationship between the mind and the world. Locke compared the mind to a blank tablet, a cabinet and a camera obscura. Hume relied on the language of printing to distinguish between the vivacity of impressions imprinted upon one’s senses and the ideas recalled in the mind…(More)”.
Harnessing AI: How to develop and integrate automated prediction systems for humanitarian anticipatory action
CEPR Report: “Despite unprecedented access to data, resources, and wealth, the world faces an escalating wave of humanitarian crises. Armed conflict, climate-induced disasters, and political instability are displacing millions and devastating communities. Nearly one in every five children are living in or fleeing conflict zones (OCHA, 2024). Often the impacts of conflict and climatic hazards – such as droughts and flood – exacerbate each other, leading to even greater suffering. As crises unfold and escalate, the need for timely and effective humanitarian action becomes paramount.
Sophisticated systems for forecasting and monitoring natural and man-made hazards have emerged as critical tools to help inform and prompt action. The full potential for the use of such automated forecasting systems to inform anticipatory action (AA) is immense but is still to be realised. By providing early warnings and predictive insights, these systems could help organisations allocate resources more efficiently, plan interventions more effectively, and ultimately save lives and prevent or reduce humanitarian impact.
This Policy Insight provides an account of the significant technical, ethical, and organisational difficulties involved in such systems, and the current solutions in place…(More)”.
Harvard Is Releasing a Massive Free AI Training Dataset Funded by OpenAI and Microsoft
Article by Kate Knibbs: “Harvard University announced Thursday it’s releasing a high-quality dataset of nearly 1 million public-domain books that could be used by anyone to train large language models and other AI tools. The dataset was created by Harvard’s newly formed Institutional Data Initiative with funding from both Microsoft and OpenAI. It contains books scanned as part of the Google Books project that are no longer protected by copyright.
Around five times the size of the notorious Books3 dataset that was used to train AI models like Meta’s Llama, the Institutional Data Initiative’s database spans genres, decades, and languages, with classics from Shakespeare, Charles Dickens, and Dante included alongside obscure Czech math textbooks and Welsh pocket dictionaries. Greg Leppert, executive director of the Institutional Data Initiative, says the project is an attempt to “level the playing field” by giving the general public, including small players in the AI industry and individual researchers, access to the sort of highly-refined and curated content repositories that normally only established tech giants have the resources to assemble. “It’s gone through rigorous review,” he says…(More)”.
The Recommendation on Information Integrity
OECD Recommendation: “…The digital transformation of societies has reshaped how people interact and engage with information. Advancements in digital technologies and novel forms of communication have changed the way information is produced, shared, and consumed, locally and globally and across all media. Technological changes and the critical importance of online information platforms offer unprecedented access to information, foster citizen engagement and connection, and allow for innovative news reporting. However, they can also provide a fertile ground for the rapid spread of false, altered, or misleading content. In addition, new generative AI tools have greatly reduced the barriers to creating and spreading content.
Promoting the availability and free flow of high-quality, evidence-based information is key to upholding individuals’ ability to seek and receive information and ideas of all kinds and to safeguarding freedom of opinion and expression.
The volume of content to which citizens are exposed can obscure and saturate public debates and help widen societal divisions. In this context, the quality of civic discourse declines as evidence-based information, which helps people make sense of their social environment, becomes harder to find. This reality has acted as a catalyst for governments to explore more closely the roles they can play, keeping as a priority in our democracies the necessity that governments should not exercise control of the information ecosystem and that, on the contrary, they support an environment where a plurality of information sources, views, and opinions can thrive…Building on the detailed policy framework outlined in the OECD report Facts not Fakes: Tackling Disinformation, Strengthening Information Integrity, the Recommendation provides an ambitious and actionable international standard that will help governments develop a systemic approach to foster information integrity, relying on a multi-stakeholder approach…(More)”.
The politics of data justice: exit, voice, or rehumanisation?
Paper by Azadeh Akbari: “Although many data justice projects envision just datafied societies, their focus on participatory ‘solutions’ to remedy injustice leaves important discussions out. For example, there has been little discussion of the meaning of data justice and its participatory underpinnings in authoritarian contexts. Additionally, the subjects of data justice are treated as universal decision-making individuals unaffected by the procedures of datafication itself. To tackle such questions, this paper starts with studying the trajectory of data justice as a concept and reflects on both its data and justice elements. It conceptualises data as embedded within a network of associations opening up a multi-level, multi-actor, intersectional understanding of data justice. Furthermore, it discusses five major conceptualisations of data justice based on social justice, capabilities, structural, sphere transgression, and abnormality of justice approaches. Discussing the limits and potentials of each of these categories, the paper argues that many of the existing participatory approaches are formulated within the neoliberal binary of choice: exit or voice (Hirschman, Citation1970). Transcending this binary and using postcolonial theories, the paper discusses the dehumanisation of individuals and groups as an integral part of datafication and underlines the inadequacy of digital harms, data protection, and privacy discourses in that regard. Finally, the paper reflects on the politics of data justice as an emancipatory concept capable of transforming standardised concepts such as digital literacy to liberating pedagogies for reclaiming the lost humanity of the oppressed (Freire, Citation1970) or evoking the possibility for multiple trajectories beyond the emerging hegemony of data capitalism…(More)”.
This is where the data to build AI comes from
Article by Melissa Heikkilä and Stephanie Arnett: “AI is all about data. Reams and reams of data are needed to train algorithms to do what we want, and what goes into the AI models determines what comes out. But here’s the problem: AI developers and researchers don’t really know much about the sources of the data they are using. AI’s data collection practices are immature compared with the sophistication of AI model development. Massive data sets often lack clear information about what is in them and where it came from.
The Data Provenance Initiative, a group of over 50 researchers from both academia and industry, wanted to fix that. They wanted to know, very simply: Where does the data to build AI come from? They audited nearly 4,000 public data sets spanning over 600 languages, 67 countries, and three decades. The data came from 800 unique sources and nearly 700 organizations.
Their findings, shared exclusively with MIT Technology Review, show a worrying trend: AI’s data practices risk concentrating power overwhelmingly in the hands of a few dominant technology companies.
In the early 2010s, data sets came from a variety of sources, says Shayne Longpre, a researcher at MIT who is part of the project.
It came not just from encyclopedias and the web, but also from sources such as parliamentary transcripts, earning calls, and weather reports. Back then, AI data sets were specifically curated and collected from different sources to suit individual tasks, Longpre says.
Then transformers, the architecture underpinning language models, were invented in 2017, and the AI sector started seeing performance get better the bigger the models and data sets were. Today, most AI data sets are built by indiscriminately hoovering material from the internet. Since 2018, the web has been the dominant source for data sets used in all media, such as audio, images, and video, and a gap between scraped data and more curated data sets has emerged and widened.
“In foundation model development, nothing seems to matter more for the capabilities than the scale and heterogeneity of the data and the web,” says Longpre. The need for scale has also boosted the use of synthetic data massively.
The past few years have also seen the rise of multimodal generative AI models, which can generate videos and images. Like large language models, they need as much data as possible, and the best source for that has become YouTube.
For video models, as you can see in this chart, over 70% of data for both speech and image data sets comes from one source.
This could be a boon for Alphabet, Google’s parent company, which owns YouTube. Whereas text is distributed across the web and controlled by many different websites and platforms, video data is extremely concentrated in one platform.
“It gives a huge concentration of power over a lot of the most important data on the web to one company,” says Longpre…(More)”.
Citizen science as an instrument for women’s health research
Paper by Sarah Ahannach et al: “Women’s health research is receiving increasing attention globally, but considerable knowledge gaps remain. Across many fields of research, active involvement of citizens in science has emerged as a promising strategy to help align scientific research with societal needs. Citizen science offers researchers the opportunity for large-scale sampling and data acquisition while engaging the public in a co-creative approach that solicits their input on study aims, research design, data gathering and analysis. Here, we argue that citizen science has the potential to generate new data and insights that advance women’s health. Based on our experience with the international Isala project, which used a citizen-science approach to study the female microbiome and its influence on health, we address key challenges and lessons for generating a holistic, community-centered approach to women’s health research. We advocate for interdisciplinary collaborations to fully leverage citizen science in women’s health toward a more inclusive research landscape that amplifies underrepresented voices, challenges taboos around intimate health topics and prioritizes women’s involvement in shaping health research agendas…(More)”.