South Korea leverages open government data for AI development


Article by Si Ying Thian: “In South Korea, open government data is powering artificial intelligence (AI) innovations in the private sector.

Take the case of TTCare which may be the world’s first mobile application to analyse eye and skin disease symptoms in pets.

AI Hub allows users to search by industry, data format and year (top row), with the data sets made available based on the particular search term “pet” (bottom half of the page). Image: AI Hub, provided by courtesy of Baek

The AI model was trained on about one million pieces of data – half of the data coming from the government-led AI Hub and the rest collected by the firm itself, according to the Korean newspaper Donga.

AI Hub is an integrated platform set up by the government to support the country’s AI infrastructure.

TTCare’s CEO Heo underlined the importance of government-led AI training data in improving the model’s ability to diagnose symptoms. The firm’s training data is currently accessible through AI Hub, and any Korean citizen can download or use it.

Pushing the boundaries of open data

Over the years, South Korea has consistently come up top in the world’s rankings for Open, Useful, and Re-usable data (OURdata) Index.

The government has been pushing the boundaries of what it can do with open data – beyond just making data usable by providing APIs. Application Programming Interfaces, or APIs, make it easier for users to tap on open government data to power their apps and services.

There is now rising interest from public sector agencies to tap on such data to train AI models, said South Korea’s National Information Society Agency (NIA)’s Principal Manager, Dongyub Baek, although this is still at an early stage.

Baek sits in NIA’s open data department, which handles policies, infrastructure such as the National Open Data Portal, as well as impact assessments of the government initiatives…(More)”

AI in the Public Service: Here for Good


Special Issue of Ethos: “…For the public good, we want AI to help unlock and drive transformative impact, in areas where there is significant potential for breakthroughs, such as cancer research, material sciences or climate change. But we also want to raise the level of generalised adoption. For the user base in the public sector, we want to learn how best to use this new tool in ways that can allow us to not only do things better, but do better things.

This is not to suggest that AI is always the best solution: it is one of many tools in the digital toolkit. Sometimes, simpler computational methods will suffice. That said, AI represents new, untapped potential for the Public Service to enhance our daily work and deliver better outcomes that ultimately benefit Singapore and Singaporeans….

To promote general adoption, we made available AI tools, such as Pair, 1 SmartCompose, 2 and AIBots. 3 They are useful to a wide range of public officers for many general tasks. Other common tools of this nature may include chatbots to support customer-facing and service delivery needs, translation, summarisation, and so on. Much of what public officers do involves words and language, which is an area that LLM-based AI technology can now help with.

Beyond improving the productivity of the Public Service, the real value lies in AI’s broader ability to transform our business and operating models to deliver greater impact. In driving adoption, we want to encourage public officers to experiment with different approaches to figure out where we can create new value by doing things differently, rather than just settle for incremental value from doing things the same old ways using new tools.

For example, we have seen how AI and automation have transformed language translation, software engineering, identity verification and border clearance. This is just the beginning and much more is possible in many other domains…(More)”.

Digital Distractions with Peer Influence: The Impact of Mobile App Usage on Academic and Labor Market Outcomes


Paper by Panle Jia Barwick, Siyu Chen, Chao Fu & Teng Li: “Concerns over the excessive use of mobile phones, especially among youths and young adults, are growing. Leveraging administrative student data from a Chinese university merged with mobile phone records, random roommate assignments, and a policy shock that affects peers’ peers, we present, to our knowledge, the first estimates of both behavioral spillover and contextual peer effects, and the first estimates of medium-term impacts of mobile app usage on academic achievement, physical health, and labor market outcomes. App usage is contagious: a one s.d. increase in roommates’ in-college app usage raises own app usage by 4.4% on average, with substantial heterogeneity across students. App usage is detrimental to both academic performance and labor market outcomes. A one s.d. increase in own app usage reduces GPAs by 36.2% of a within-cohort-major s.d. and lowers wages by 2.3%. Roommates’ app usage exerts both direct effects (e.g., noise and disruptions) and indirect effects (via behavioral spillovers) on GPA and wage, resulting in a total negative impact of over half the size of the own usage effect. Extending China’s minors’ game restriction policy of 3 hours per week to college students would boost their initial wages by 0.7%. Using high-frequency GPS data, we identify one underlying mechanism: high app usage crowds out time in study halls and increases absences from and late arrivals at lectures…(More)”.

China: Autocracy 2.0


Paper by David Y. Yang: “Autocracy 2.0, exemplified by modern China, is economically robust, technologically advanced, globally engaged, and controlled through subtle and sophisticated methods. What defines China’s political economy, and what drives Autocracy 2.0? What is its future direction? I start by discussing two key challenges autocracies face: incentives and information. I then describe Autocracy 1.0’s reliance on fear and repression to address these issues. It makes no credible promises, using coercion for compliance, resulting in a low-information environment. Next, I introduce Autocracy 2.0, highlighting its significant shift in handling commitment and information challenges. China uses economic incentives to align interests with regime survival, fostering support. It employs advanced bureaucratic structures and technology to manage incentives and information, enabling success in a high-information environment. Finally, I explore Autocracy 3.0’s potential. In China, forces might revert to Autocracy 1.0, using technology for state control as growth slows but aspirations stay high. Globally, modern autocracies, led by China, are becoming major geopolitical forces, challenging the liberal democratic order…(More)”.

Need for Co-creating Urban Data Collaborative


Blog by Gaurav Godhwani: “…The Government of India has initiated various urban reforms for our cities like — Atal Mission for Rejuvenation and Urban Transformation 2.0 (AMRUT 2.0), Smart Cities Mission (SCM), Swachh Bharat Mission 2.0 (SBM-Urban 2.0) and development of Urban & Industrial Corridors. To help empower cities with data, the Ministry of Housing & Urban Affairs(MoHUA) has also launched various data initiatives including — DataSmart Cities StrategyData Maturity Assessment FrameworkSmart Cities Open Data PortalCity Innovation Exchange, India Urban Data Exchange and the India Urban Observatory.

Unfortunately, most of the urban data remains in silos and capacities for our cities to harness urban data to improve decision-making, strengthen citizen participation continues to be limited. As per the last Data Maturity Assessment Framework (DMAF) assessment conducted in November 2020 by MoHUA, among 100 smart cities only 45 cities have drafted/ approved their City Data Policies with just 32 cities having a dedicated data budget in 2020–21 for data-related activities. Moreover, in-terms of fostering data collaborations, only 12 cities formed data alliances to achieve tangible outcomes. We hope smart cities continue this practice by conducting a yearly self-assessment to progress in their journey to harness data for improving their urban planning.

Seeding Urban Data Collaborative to advance City-level Data Engagements

There is a need to bring together a diverse set of stakeholders including governments, civil societies, academia, businesses and startups, volunteer groups and more to share and exchange urban data in a secure, standardised and interoperable manner, deriving more value from re-using data for participatory urban development. Along with improving data sharing among these stakeholders, it is necessary to regularly convene, ideate and conduct capacity building sessions and institutionalise data practices.

Urban Data Collaborative can bring together such diverse stakeholders who could address some of these perennial challenges in the ecosystem while spurring innovation…(More)”

China’s Hinterland Becomes A Critical Datascape


Article by Gary Zhexi Zhang: “In 2014, the southwestern province of Guizhou, a historically poor and mountainous area, beat out rival regions to become China’s first “Big Data Comprehensive Pilot Zone,” as part of a national directive to develop the region — which is otherwise best known as an exporter of tobacco, spirits and coal — into the infrastructural backbone of the country’s data industry. Since then, vast investment has poured into the province. Thousands of miles of highway and high-speed rail tunnel through the mountains. Driving through the province can feel vertiginous: Of the hundred highest bridges in the world, almost half are in Guizhou, and almost all were built in the last 15 years.

In 2015, Xi Jinping visited Gui’an New Area to inaugurate the province’s transformation into China’s “Big Data Valley,” exemplifying the central government’s goal to establish “high quality social and economic development,” ubiquitously advertised through socialist-style slogans plastered on highways and city streets…(More)”.

China’s biggest AI model is challenging American dominance


Article by Sam Eifling: “So far, the AI boom has been dominated by U.S. companies like OpenAI, Google, and Meta. In recent months, though, a new name has been popping up on benchmarking lists: Alibaba’s Qwen. Over the past few months, variants of Qwen have been topping the leaderboards of sites that measure an AI model’s performance.

“Qwen 72B is the king, and Chinese models are dominating,” Hugging Face CEO Clem Delangue wrote in June, after a Qwen-based model first rose to the top of his company’s Open LLM leaderboard.

It’s a surprising turnaround for the Chinese AI industry, which many thought was doomed by semiconductor restrictions and limitations on computing power. Qwen’s success is showing that China can compete with the world’s best AI models — raising serious questions about how long U.S. companies will continue to dominate the field. And by focusing on capabilities like language support, Qwen is breaking new ground on what an AI model can do — and who it can be built for.

Those capabilities have come as a surprise to many developers, even those working on Qwen itself. AI developer David Ng used Qwen to build the model that topped the Open LLM leaderboard. He’s built models using Meta and Google’s technology also but says Alibaba’s gave him the best results. “For some reason, it works best on the Chinese models,” he told Rest of World. “I don’t know why.”..(More)”

Building LLMs for the social sector: Emerging pain points


Blog by Edmund Korley: “…One of the sprint’s main tracks focused on using LLMs to enhance the impact and scale of chat services in the social sector.

Six organizations participated, with operations spanning Africa and India. Bandhu empowers India’s blue-collar workers and migrants by connecting them to jobs and affordable housing, helping them take control of their livelihoods and future stability. Digital Green enhances rural farmers’ agency with AI-driven insights to improve agricultural productivity and livelihoods. Jacaranda Health provides mothers in sub-Saharan Africa with essential information and support to improve maternal and newborn health outcomes. Kabakoo equips youth in Francophone Africa with digital skills, fostering self-reliance and economic independence. Noora Health teaches Indian patients and caregivers critical health skills, enhancing their ability to manage care. Udhyam provides micro-entrepreneurs’ with education, mentorship, and financial support to build sustainable businesses.

These organizations demonstrate diverse ways one can boost human agency: they help people in underserved communities take control of their lives, make more informed choices, and build better futures – and they are piloting AI interventions to scale these efforts…(More)”.

Training LLMs to Draft Replies to Parliamentary Questions


Blog by Watson Chua: “In Singapore, the government is answerable to Parliament and Members of Parliament (MPs) may raise queries to any Minister on any matter in his portfolio. These questions can be answered orally during the Parliament sitting or through a written reply. Regardless of the medium, public servants in the ministries must gather materials to answer the question and prepare a response.

Generative AI and Large Language Models (LLMs) have already been applied to help public servants do this more effectively and efficiently. For example, Pair Search (publicly accessible) and the Hansard Analysis Tool (only accessible to public servants) help public servants search for relevant information in past Parliamentary Sittings relevant to the question and synthesise a response to it.

The existing systems draft the responses using prompt engineering and Retrieval Augmented Generation (RAG). To recap, RAG consists of two main parts:

  • Retriever: A search engine that finds documents relevant to the question
  • Generator: A text generation model (LLM) that takes in the instruction, the question, and the search results from the retriever to respond to the question
A typical RAG system. Illustration by Hrishi Olickel, taken from here.

Using a pre-trained instruction-tuned LLM like GPT-4o, the generator can usually generate a good response. However, it might not be exactly what is desired in terms of verbosity, style and writing prose, and additional human post-processing might be needed. Extensive prompt engineering or few-shot learning can be done to mold the response at the expense of incurring higher costs from using additional tokens in the prompt…(More)”

Scaling Synthetic Data Creation with 1,000,000,000 Personas


Paper by Xin Chan, et al: “We propose a novel persona-driven data synthesis methodology that leverages various perspectives within a large language model (LLM) to create diverse synthetic data. To fully exploit this methodology at scale, we introduce Persona Hub — a collection of 1 billion diverse personas automatically curated from web data. These 1 billion personas (~13% of the world’s total population), acting as distributed carriers of world knowledge, can tap into almost every perspective encapsulated within the LLM, thereby facilitating the creation of diverse synthetic data at scale for various scenarios. By showcasing Persona Hub’s use cases in synthesizing high-quality mathematical and logical reasoning problems, instructions (i.e., user prompts), knowledge-rich texts, game NPCs and tools (functions) at scale, we demonstrate persona-driven data synthesis is versatile, scalable, flexible, and easy to use, potentially driving a paradigm shift in synthetic data creation and applications in practice, which may have a profound impact on LLM research and development…(More)”.