Need for Co-creating Urban Data Collaborative


Blog by Gaurav Godhwani: “…The Government of India has initiated various urban reforms for our cities like — Atal Mission for Rejuvenation and Urban Transformation 2.0 (AMRUT 2.0), Smart Cities Mission (SCM), Swachh Bharat Mission 2.0 (SBM-Urban 2.0) and development of Urban & Industrial Corridors. To help empower cities with data, the Ministry of Housing & Urban Affairs(MoHUA) has also launched various data initiatives including — DataSmart Cities StrategyData Maturity Assessment FrameworkSmart Cities Open Data PortalCity Innovation Exchange, India Urban Data Exchange and the India Urban Observatory.

Unfortunately, most of the urban data remains in silos and capacities for our cities to harness urban data to improve decision-making, strengthen citizen participation continues to be limited. As per the last Data Maturity Assessment Framework (DMAF) assessment conducted in November 2020 by MoHUA, among 100 smart cities only 45 cities have drafted/ approved their City Data Policies with just 32 cities having a dedicated data budget in 2020–21 for data-related activities. Moreover, in-terms of fostering data collaborations, only 12 cities formed data alliances to achieve tangible outcomes. We hope smart cities continue this practice by conducting a yearly self-assessment to progress in their journey to harness data for improving their urban planning.

Seeding Urban Data Collaborative to advance City-level Data Engagements

There is a need to bring together a diverse set of stakeholders including governments, civil societies, academia, businesses and startups, volunteer groups and more to share and exchange urban data in a secure, standardised and interoperable manner, deriving more value from re-using data for participatory urban development. Along with improving data sharing among these stakeholders, it is necessary to regularly convene, ideate and conduct capacity building sessions and institutionalise data practices.

Urban Data Collaborative can bring together such diverse stakeholders who could address some of these perennial challenges in the ecosystem while spurring innovation…(More)”

China’s Hinterland Becomes A Critical Datascape


Article by Gary Zhexi Zhang: “In 2014, the southwestern province of Guizhou, a historically poor and mountainous area, beat out rival regions to become China’s first “Big Data Comprehensive Pilot Zone,” as part of a national directive to develop the region — which is otherwise best known as an exporter of tobacco, spirits and coal — into the infrastructural backbone of the country’s data industry. Since then, vast investment has poured into the province. Thousands of miles of highway and high-speed rail tunnel through the mountains. Driving through the province can feel vertiginous: Of the hundred highest bridges in the world, almost half are in Guizhou, and almost all were built in the last 15 years.

In 2015, Xi Jinping visited Gui’an New Area to inaugurate the province’s transformation into China’s “Big Data Valley,” exemplifying the central government’s goal to establish “high quality social and economic development,” ubiquitously advertised through socialist-style slogans plastered on highways and city streets…(More)”.

China’s biggest AI model is challenging American dominance


Article by Sam Eifling: “So far, the AI boom has been dominated by U.S. companies like OpenAI, Google, and Meta. In recent months, though, a new name has been popping up on benchmarking lists: Alibaba’s Qwen. Over the past few months, variants of Qwen have been topping the leaderboards of sites that measure an AI model’s performance.

“Qwen 72B is the king, and Chinese models are dominating,” Hugging Face CEO Clem Delangue wrote in June, after a Qwen-based model first rose to the top of his company’s Open LLM leaderboard.

It’s a surprising turnaround for the Chinese AI industry, which many thought was doomed by semiconductor restrictions and limitations on computing power. Qwen’s success is showing that China can compete with the world’s best AI models — raising serious questions about how long U.S. companies will continue to dominate the field. And by focusing on capabilities like language support, Qwen is breaking new ground on what an AI model can do — and who it can be built for.

Those capabilities have come as a surprise to many developers, even those working on Qwen itself. AI developer David Ng used Qwen to build the model that topped the Open LLM leaderboard. He’s built models using Meta and Google’s technology also but says Alibaba’s gave him the best results. “For some reason, it works best on the Chinese models,” he told Rest of World. “I don’t know why.”..(More)”

Building LLMs for the social sector: Emerging pain points


Blog by Edmund Korley: “…One of the sprint’s main tracks focused on using LLMs to enhance the impact and scale of chat services in the social sector.

Six organizations participated, with operations spanning Africa and India. Bandhu empowers India’s blue-collar workers and migrants by connecting them to jobs and affordable housing, helping them take control of their livelihoods and future stability. Digital Green enhances rural farmers’ agency with AI-driven insights to improve agricultural productivity and livelihoods. Jacaranda Health provides mothers in sub-Saharan Africa with essential information and support to improve maternal and newborn health outcomes. Kabakoo equips youth in Francophone Africa with digital skills, fostering self-reliance and economic independence. Noora Health teaches Indian patients and caregivers critical health skills, enhancing their ability to manage care. Udhyam provides micro-entrepreneurs’ with education, mentorship, and financial support to build sustainable businesses.

These organizations demonstrate diverse ways one can boost human agency: they help people in underserved communities take control of their lives, make more informed choices, and build better futures – and they are piloting AI interventions to scale these efforts…(More)”.

Training LLMs to Draft Replies to Parliamentary Questions


Blog by Watson Chua: “In Singapore, the government is answerable to Parliament and Members of Parliament (MPs) may raise queries to any Minister on any matter in his portfolio. These questions can be answered orally during the Parliament sitting or through a written reply. Regardless of the medium, public servants in the ministries must gather materials to answer the question and prepare a response.

Generative AI and Large Language Models (LLMs) have already been applied to help public servants do this more effectively and efficiently. For example, Pair Search (publicly accessible) and the Hansard Analysis Tool (only accessible to public servants) help public servants search for relevant information in past Parliamentary Sittings relevant to the question and synthesise a response to it.

The existing systems draft the responses using prompt engineering and Retrieval Augmented Generation (RAG). To recap, RAG consists of two main parts:

  • Retriever: A search engine that finds documents relevant to the question
  • Generator: A text generation model (LLM) that takes in the instruction, the question, and the search results from the retriever to respond to the question
A typical RAG system. Illustration by Hrishi Olickel, taken from here.

Using a pre-trained instruction-tuned LLM like GPT-4o, the generator can usually generate a good response. However, it might not be exactly what is desired in terms of verbosity, style and writing prose, and additional human post-processing might be needed. Extensive prompt engineering or few-shot learning can be done to mold the response at the expense of incurring higher costs from using additional tokens in the prompt…(More)”

Scaling Synthetic Data Creation with 1,000,000,000 Personas


Paper by Xin Chan, et al: “We propose a novel persona-driven data synthesis methodology that leverages various perspectives within a large language model (LLM) to create diverse synthetic data. To fully exploit this methodology at scale, we introduce Persona Hub — a collection of 1 billion diverse personas automatically curated from web data. These 1 billion personas (~13% of the world’s total population), acting as distributed carriers of world knowledge, can tap into almost every perspective encapsulated within the LLM, thereby facilitating the creation of diverse synthetic data at scale for various scenarios. By showcasing Persona Hub’s use cases in synthesizing high-quality mathematical and logical reasoning problems, instructions (i.e., user prompts), knowledge-rich texts, game NPCs and tools (functions) at scale, we demonstrate persona-driven data synthesis is versatile, scalable, flexible, and easy to use, potentially driving a paradigm shift in synthetic data creation and applications in practice, which may have a profound impact on LLM research and development…(More)”.

Building an AI ecosystem in a small nation: lessons from Singapore’s journey to the forefront of AI


Paper by Shaleen Khanal, Hongzhou Zhang & Araz Taeihagh: “Artificial intelligence (AI) is arguably the most transformative technology of our time. While all nations would like to mobilize their resources to play an active role in AI development and utilization, only a few nations, such as the United States and China, have the resources and capacity to do so. If so, how can smaller or less resourceful countries navigate the technological terrain to emerge at the forefront of AI development? This research presents an in-depth analysis of Singapore’s journey in constructing a robust AI ecosystem amidst the prevailing global dominance of the United States and China. By examining the case of Singapore, we argue that by designing policies that address risks associated with AI development and implementation, smaller countries can create a vibrant AI ecosystem that encourages experimentation and early adoption of the technology. In addition, through Singapore’s case, we demonstrate the active role the government can play, not only as a policymaker but also as a steward to guide the rest of the economy towards the application of AI…(More)”.

Exploring Visitor Density Trends in Rest Areas Through Google Maps Data and Data Mining


Paper by Marita Prasetyani, R. Rizal Isnanto and Catur Edi Widodo: “Rest areas play a vital role in ensuring the safety and comfort of travelers. This study examines the visitor density at the toll and non-toll rest areas using data mining techniques applied to Google Maps Places data. By utilizing extensive information from Google Maps, the research aims to uncover patterns and trends in visitor behavior and pinpoint peak usage times. The findings can guide improved planning and management of rest areas, thereby enhancing the overall travel experience for road users and further research to determine the location of the new rest area.Understanding patterns or trends in visitor density at rest areas involves analyzing the time of day, location, and other factors influencing the density level. Understanding these trends can provide essential insights for rest area management, infrastructure planning, and the establishment of new rest areas.Data from Google Maps provides an invaluable source of real-time and historical information, enabling accurate and in-depth analysis of visitor behavior.Data mining helps identify relationships not immediately apparent in the data, providing a deeper understanding and supporting data-driven decision-making…(More)”.

Is Software Eating the World?


Paper by Sangmin Aum & Yongseok Shin: “When explaining the declining labor income share in advanced economies, the macro literature finds that the elasticity of substitution between capital and labor is greater than one. However, the vast majority of micro-level estimates shows that capital and labor are complements (elasticity less than one). Using firm- and establishment-level data from Korea, we divide capital into equipment and software, as they may interact with labor in different ways. Our estimation shows that equipment and labor are complements (elasticity 0.6), consistent with other micro-level estimates, but software and labor are substitutes (1.6), a novel finding that helps reconcile the macro vs. micro-literature elasticity discord. As the quality of software improves, labor shares fall within firms because of factor substitution and endogenously rising markups. In addition, production reallocates toward firms that use software more intensively, as they become effectively more productive. Because in the data these firms have higher markups and lower labor shares, the reallocation further raises the aggregate markup and reduces the aggregate labor share. The rise of software accounts for two-thirds of the labor share decline in Korea between 1990 and 2018. The factor substitution and the markup channels are equally important. On the other hand, the falling equipment price plays a minor role, because the factor substitution and the markup channels offset each other…(More)”.

Japan’s push to make all research open access is taking shape


Article by Dalmeet Singh Chawla: “The Japanese government is pushing ahead with a plan to make Japan’s publicly funded research output free to read. In June, the science ministry will assign funding to universities to build the infrastructure needed to make research papers free to read on a national scale. The move follows the ministry’s announcement in February that researchers who receive government funding will be required to make their papers freely available to read on the institutional repositories from April 2025.

The Japanese plan “is expected to enhance the long-term traceability of research information, facilitate secondary research and promote collaboration”, says Kazuki Ide, a health-sciences and public-policy scholar at Osaka University in Suita, Japan, who has written about open access in Japan.

The nation is one of the first Asian countries to make notable advances towards making more research open access (OA) and among the first countries in the world to forge a nationwide plan for OA.

The plan follows in the footsteps of the influential Plan S, introduced six years ago by a group of research funders in the United States and Europe known as cOAlition S, to accelerate the move to OA publishing. The United States also implemented an OA mandate in 2022 that requires all research funded by US taxpayers to be freely available from 2026…(More)”.