Explore our articles
View All Results

Stefaan Verhulst

Article by Jay Peters: “Reddit says that it has caught AI companies scraping its data from the Internet Archive’s Wayback Machine, so it’s going to start blocking the Internet Archive from indexing the vast majority of Reddit. The Wayback Machine will no longer be able to crawl post detail pages, comments, or profiles; instead, it will only be able to index the Reddit.com homepage, which effectively means Internet Archive will only be able to archive insights into which news headlines and posts were most popular on a given day.

”Internet Archive provides a service to the open web, but we’ve been made aware of instances where AI companies violate platform policies, including ours, and scrape data from the Wayback Machine,” spokesperson Tim Rathschmidt tells The Verge.

The Internet Archive’s mission is to keep a digital archive of websites on the internet and “other cultural artifacts,” and the Wayback Machine is a tool you can use to look at pages as they appeared on certain dates, but Reddit believes not all of its content should be archived that way. “Until they’re able to defend their site and comply with platform policies (e.g., respecting user privacy, re: deleting removed content) we’re limiting some of their access to Reddit data to protect redditors,” Rathschmidt says.

The limits will start “ramping up” today, and Reddit says it reached out to the Internet Archive “in advance” to “inform them of the limits before they go into effect,” according to Rathschmidt. He says Reddit has also “raised concerns” about the ability of people to scrape content from the Internet Archive in the past…(More)”.

Reddit will block the Internet Archive

Paper by Juan Zambrano et al: “Participatory Budgeting (PB) empowers citizens to propose and vote on public investment projects. Yet, despite its democratic potential, PB initiatives often suffer from low participation rates, limiting their visibility and perceived legitimacy. In this work, we aim to strengthen PB elections in two key ways: by supporting project proposers in crafting better proposals, and by helping PB organizers manage large volumes of submissions in a transparent manner. We propose a privacy-preserving approach to predict which PB proposals are likely to be funded, using only their textual descriptions and anonymous historical voting records — without relying on voter demographics or personally identifiable information. We evaluate the performance of GPT 4 Turbo in forecasting proposal outcomes across varying contextual scenarios, observing that the LLM’s prior knowledge needs to be complemented by past voting data to obtain predictions reflecting real-world PB voting behavior. Our findings highlight the potential of AI-driven tools to support PB processes by improving transparency, planning efficiency, and civic engagement…(More)”.

Leveraging LLMs for Privacy-Aware Predictions in Participatory Budgeting

Blog by Mark Headd: “…It’s not surprising that the civic tech world has largely metabolized the rise of Artificial Intelligence (AI) as a set of tools we can use to make these interfaces even better. Chatbots! Accessible PDFs! These are good and righteous efforts that make things easier for government employees and better for the people they serve. But they’re sitting on a fault line that AI is shifting beneath our feet: What if the primacy and focus we give *interfaces, *and the constraints we’ve accepted as immutable, are changing?..

Modern generative AI tools can assemble complex, high-fidelity interfaces quickly and cheaply. If you’re a civic designer used to hand-crafting bespoke interfaces with care, the idea of just-in-time interfaces in production makes your hair stand on end. Us, too. The reality is, this is still an idea that lies in the future. But the future is getting here very quickly.

Shopify, with its 5M DAUs and $292B processed annually, is doing its internal prototyping with generative AI. Delivering production UIs this way is gaining steam both in theory and in proof-of-concept (e.g., adaptive UIs, Fred Hohman’s Project Biscuit, Sean Grove’s ConjureUI demo). The idea is serious enough that Google, not a slouch in the setting-web-standards game, is getting into the mix with Stitch and Opal. AWS is throwing its hat in the ring too. Smaller players like BuildAIReplitFigma, and Camunda are exploring LLM-driven UI generation and workflow design. All of these at first may generate wacky interfaces and internet horror stories, and right now they’re mostly focused on dynamic UI generation for a developer, not a user. But these are all different implementations of an idea that are converging on a clear endpoint, and if they can get into use at any substantial scale, they will become more reliable and production ready very quickly…(More)”.

The end of civic tech’s interface era

Paper by Eray Erturk et al: “Wearable devices record physiological and behavioral signals that can improve health predictions. While foundation models are increasingly used for such predictions, they have been primarily applied to low-level sensor data, despite behavioral data often being more informative due to their alignment with physiologically relevant timescales and quantities. We develop foundation models of such behavioral signals using over 2.5B hours of wearable data from 162K individuals, systematically optimizing architectures and tokenization strategies for this unique dataset. Evaluated on 57 health-related tasks, our model shows strong performance across diverse real-world applications including individual-level classification and time-varying health state prediction. The model excels in behavior-driven tasks like sleep prediction, and improves further when combined with representations of raw sensor data. These results underscore the importance of tailoring foundation model design to wearables and demonstrate the potential to enable new health applications…(More)”

Beyond Sensor Data: Foundation Models of Behavioral Data from Wearables Improve Health Predictions

Paper by Moritz Schütz, Lukas Kriesch, Sebastian Losacker: “The relevance of institutions for regional development has been well established in economic geography. In this context, local and regional governments play a central role, particularly through place-based and place-sensitive strategies. However, systematic and scalable insights into their priorities and strategies remain limited due to data availability. This paper develops a methodological approach for the comprehensive measurement and analysis of local governance activities using web mining, natural language processing (NLP), and machine learning techniques. We construct a novel dataset by web scraping and extracting cleaned text data from German county and municipality websites, which provides detailed information on local government functions, services, and regulations. Our county-level topic modelling approach identifies 205 topics, from which we select 30 prominent topics to demonstrate the variety of topics found on county websites. An in-depth analysis of the three exemplary topics, Urban Development and Planning, Climate Protection Initiatives, and Business Development and Support, reveals how strategic priorities vary across space and how counties differ in their framing of similar topics. This study offers an explanatory framework for analysing the discursive dimensions of local governance and mapping regional differences in policy focus. In doing so, it expands the methodological toolkit of regional research and opens new avenues in understanding local governance through web data. We make an aggregated version of the data set freely available online…(More)”.

Mapping Local Government Priorities: A Web-Mining Approach for Regional Research

Report by National Academies of Sciences, Engineering, and Medicine: “In recent years, Lidar technology has improved. Additionally, the experiences of state departments of transportation (DOTs) with Lidar have grown, and documentation of existing practices, business uses, and needs would now benefit state DOTs’ efforts.

NCHRP Synthesis 642: Practices for Collecting, Managing, and Using Light Detection and Ranging Data, from TRB’s National Cooperative Highway Research Program, documents state DOTs’ practices related to technical, administrative, policy, and other aspects of collecting, managing, and using Lidar data to support current and future practices…(More)”

Practices for Collecting, Managing, and Using Light Detection and Ranging Data

Article by Madison Leeson: “Cultural heritage researchers often have to sift through a mountain of data related to the cultural items they study, including reports, museum records, news, and databases. The information in these sources contains a significant amount of unstructured and semi-structured data, including ownership histories (‘provenance’), object descriptions, and timelines, which presents an opportunity to leverage automated systems. Recognising the scale and importance of the issue, researchers at the Italian Institute of Technology’s Centre for Cultural Heritage Technology have fine-tuned three natural language processing (NLP) models to distill key information from these unstructured texts. This was performed within the scope of the EU-funded RITHMS project, which has built a digital platform for law enforcement to trace illicit cultural goods using social network analysis (SNA). The research team aimed to fill the critical gap: how do we transform complex textual records into clean, structured, analysable data?

The paper introduces a streamlined pipeline to create custom, domain-specific datasets from textual heritage records, then trained and fine-tuned NLP models (derived from spaCy) to perform named entity recognition (NER) on challenging inputs like provenance, museum registries, and records of stolen and missing art and artefacts. It evaluates zero-shot models such as GLiNER, and employs Meta’s Llama3 (8B) to bootstrap high-quality annotations, minimising the need for manual labelling of the data. The result? Fine-tuned transformer models (especially on provenance data) significantly outperformed out-of-the-box models, highlighting the power of small, curated training sets in a specialised domain…(More)

Enriching Unstructured Cultural Heritage Data Using NLP

Paper by Angelos Assos, Carmel Baharav, Bailey Flanigan, Ariel Procaccia: “Citizens’ assemblies are an increasingly influential form of deliberative democracy, where randomly selected people discuss policy questions. The legitimacy of these assemblies hinges on their representation of the broader population, but participant dropout often leads to an unbalanced composition. In practice, dropouts are replaced by preselected alternates, but existing methods do not address how to choose these alternates. To address this gap, we introduce an optimization framework for alternate selection. Our algorithmic approach, which leverages learning-theoretic machinery, estimates dropout probabilities using historical data and selects alternates to minimize expected misrepresentation. Our theoretical bounds provide guarantees on sample complexity (with implications for computational efficiency) and on loss due to dropout probability mis-estimation. Empirical evaluation using real-world data demonstrates that, compared to the status quo, our method significantly improves representation while requiring fewer alternates…(More)”.

Alternates, Assemble! Selecting Optimal Alternates for Citizens’ Assemblies

NASA Harvest: “Have you ever thought about how scientists keep track of what is happening all over the Earth? Thanks to satellites orbiting high above us, we have eyes in the sky that capture vast amounts of information every day. These satellites monitor everything from sprawling forests and melting glaciers to tiny fishing boats and fields of crops. But turning this endless stream of satellite data into meaningful insights is a big challenge. That is why a team of researchers from NASA Harvest, the Allen Institute for AI (Ai2) and partner organizations set out to create a smarter solution.

A single Galileo model can be applied to a wide range of remote sensing tasks. We achieve this by training Galileo on the diversity of remote sensing modalities used by practitioners for different applications. In addition, we train Galileo to process views of these modalities used by practitioners, ranging from pixel time series to multi timestep imagery to single timestep imagery.

Their new study, titled Galileo: Learning Global & Local Features of Many Remote Sensing Modalities,” was recently published and brought together experts from across the globe. The team aimed to build a tool that could make better sense of all the different types of satellite data we collect so we can make more informed decisions to protect our world.

Why is this research so important? Because satellites do far more than take pictures from space. They help farmers decide when to plant and harvest crops, track how fast glaciers are disappearing, monitor floods, and even detect marine debris floating in the ocean. However, satellite data comes in many forms, like optical images, radar scans, and climate measurements. Until now, most computer models could only handle one type of data at a time. This meant scientists needed separate systems for each problem.

Enter Galileo. This new artificial intelligence (AI) model was designed to process many kinds of satellite data all at once. Even more impressively, Galileo can detect both large-scale patterns, like glaciers retreating over decades, and tiny, short-lived details, like a fishing boat appearing for just a day. By learning to recognize these patterns across multiple scales and data types, Galileo gives researchers a more complete view of what is happening on Earth.

The team found that Galileo outperformed older models that were specialized for just one kind of data. With Galileo, scientists can now use a single model to tackle a wide range of challenges. These include mapping agricultural land, detecting floods, and monitoring marine pollution. It is a powerful step toward making satellite data more versatile and accessible…(More)”

A New Perspective from Space: How Galileo is Advancing NASA Harvest’s Mission to Safeguard Our Planet

Editorial to Special Issue by Saeid Pourroostaei Ardakani et al: “Data Analytics in Sustainable City PlanningPredictive analytics play an increasingly central role in sustainable city planning. By applying machine learning algorithms to large-scale and multi-source datasets, cities are able to forecast dynamic phenomena such as traffic congestion, crime incidents, and flood risk. These anticipatory insights are crucial for proactive urban management, allowing for early interventions and resource optimisation. Alongside this, the emergence of Digital Twins marks a shift from reactive to real-time urban governance. As a result, cities such as Singapore, Los Angeles, and Amsterdam improve these models to manage infrastructure across sectors including transportation, water supply, energy distribution, and public space usage. These digital ecosystems enable planners to test policy scenarios, monitor service performance, and respond adaptively to changing conditions.

The integration of socio-demographic data into geospatial models enables researchers to identify and analyse disparities in urban vulnerability. For instance, spatial mapping of heat exposure in Indian cities such as Delhi and Bengaluru has informed the more equitable allocation of cooling infrastructure. These equity-oriented approaches ensure that sustainability initiatives are not only technically robust but also socially inclusive. Indeed, they support policymakers in targeting resources toward the most vulnerable populations and thereby addressing persistent inequalities in urban service delivery.

The role of participatory digital platforms is a critical dimension of the sustainable planning discourse. Cities are increasingly turning to the use of GIS tools and e-planning applications to facilitate community involvement in the urban design process. These tools democratise access to data and decision-making and enable citizens to co-create solutions for their neighbourhoods. Such participation enhances the legitimacy and responsiveness of urban policy especially in contexts where historically marginalised groups have been excluded from formal planning mechanisms…(More)”.

Data Analytics in Sustainable City Planning

Get the latest news right in you inbox

Subscribe to curated findings and actionable knowledge from The Living Library, delivered to your inbox every Friday