Training LLMs to Draft Replies to Parliamentary Questions


Blog by Watson Chua: “In Singapore, the government is answerable to Parliament and Members of Parliament (MPs) may raise queries to any Minister on any matter in his portfolio. These questions can be answered orally during the Parliament sitting or through a written reply. Regardless of the medium, public servants in the ministries must gather materials to answer the question and prepare a response.

Generative AI and Large Language Models (LLMs) have already been applied to help public servants do this more effectively and efficiently. For example, Pair Search (publicly accessible) and the Hansard Analysis Tool (only accessible to public servants) help public servants search for relevant information in past Parliamentary Sittings relevant to the question and synthesise a response to it.

The existing systems draft the responses using prompt engineering and Retrieval Augmented Generation (RAG). To recap, RAG consists of two main parts:

  • Retriever: A search engine that finds documents relevant to the question
  • Generator: A text generation model (LLM) that takes in the instruction, the question, and the search results from the retriever to respond to the question
A typical RAG system. Illustration by Hrishi Olickel, taken from here.

Using a pre-trained instruction-tuned LLM like GPT-4o, the generator can usually generate a good response. However, it might not be exactly what is desired in terms of verbosity, style and writing prose, and additional human post-processing might be needed. Extensive prompt engineering or few-shot learning can be done to mold the response at the expense of incurring higher costs from using additional tokens in the prompt…(More)”

Scaling Synthetic Data Creation with 1,000,000,000 Personas


Paper by Xin Chan, et al: “We propose a novel persona-driven data synthesis methodology that leverages various perspectives within a large language model (LLM) to create diverse synthetic data. To fully exploit this methodology at scale, we introduce Persona Hub — a collection of 1 billion diverse personas automatically curated from web data. These 1 billion personas (~13% of the world’s total population), acting as distributed carriers of world knowledge, can tap into almost every perspective encapsulated within the LLM, thereby facilitating the creation of diverse synthetic data at scale for various scenarios. By showcasing Persona Hub’s use cases in synthesizing high-quality mathematical and logical reasoning problems, instructions (i.e., user prompts), knowledge-rich texts, game NPCs and tools (functions) at scale, we demonstrate persona-driven data synthesis is versatile, scalable, flexible, and easy to use, potentially driving a paradigm shift in synthetic data creation and applications in practice, which may have a profound impact on LLM research and development…(More)”.

Building an AI ecosystem in a small nation: lessons from Singapore’s journey to the forefront of AI


Paper by Shaleen Khanal, Hongzhou Zhang & Araz Taeihagh: “Artificial intelligence (AI) is arguably the most transformative technology of our time. While all nations would like to mobilize their resources to play an active role in AI development and utilization, only a few nations, such as the United States and China, have the resources and capacity to do so. If so, how can smaller or less resourceful countries navigate the technological terrain to emerge at the forefront of AI development? This research presents an in-depth analysis of Singapore’s journey in constructing a robust AI ecosystem amidst the prevailing global dominance of the United States and China. By examining the case of Singapore, we argue that by designing policies that address risks associated with AI development and implementation, smaller countries can create a vibrant AI ecosystem that encourages experimentation and early adoption of the technology. In addition, through Singapore’s case, we demonstrate the active role the government can play, not only as a policymaker but also as a steward to guide the rest of the economy towards the application of AI…(More)”.

Exploring Visitor Density Trends in Rest Areas Through Google Maps Data and Data Mining


Paper by Marita Prasetyani, R. Rizal Isnanto and Catur Edi Widodo: “Rest areas play a vital role in ensuring the safety and comfort of travelers. This study examines the visitor density at the toll and non-toll rest areas using data mining techniques applied to Google Maps Places data. By utilizing extensive information from Google Maps, the research aims to uncover patterns and trends in visitor behavior and pinpoint peak usage times. The findings can guide improved planning and management of rest areas, thereby enhancing the overall travel experience for road users and further research to determine the location of the new rest area.Understanding patterns or trends in visitor density at rest areas involves analyzing the time of day, location, and other factors influencing the density level. Understanding these trends can provide essential insights for rest area management, infrastructure planning, and the establishment of new rest areas.Data from Google Maps provides an invaluable source of real-time and historical information, enabling accurate and in-depth analysis of visitor behavior.Data mining helps identify relationships not immediately apparent in the data, providing a deeper understanding and supporting data-driven decision-making…(More)”.

Is Software Eating the World?


Paper by Sangmin Aum & Yongseok Shin: “When explaining the declining labor income share in advanced economies, the macro literature finds that the elasticity of substitution between capital and labor is greater than one. However, the vast majority of micro-level estimates shows that capital and labor are complements (elasticity less than one). Using firm- and establishment-level data from Korea, we divide capital into equipment and software, as they may interact with labor in different ways. Our estimation shows that equipment and labor are complements (elasticity 0.6), consistent with other micro-level estimates, but software and labor are substitutes (1.6), a novel finding that helps reconcile the macro vs. micro-literature elasticity discord. As the quality of software improves, labor shares fall within firms because of factor substitution and endogenously rising markups. In addition, production reallocates toward firms that use software more intensively, as they become effectively more productive. Because in the data these firms have higher markups and lower labor shares, the reallocation further raises the aggregate markup and reduces the aggregate labor share. The rise of software accounts for two-thirds of the labor share decline in Korea between 1990 and 2018. The factor substitution and the markup channels are equally important. On the other hand, the falling equipment price plays a minor role, because the factor substitution and the markup channels offset each other…(More)”.

Japan’s push to make all research open access is taking shape


Article by Dalmeet Singh Chawla: “The Japanese government is pushing ahead with a plan to make Japan’s publicly funded research output free to read. In June, the science ministry will assign funding to universities to build the infrastructure needed to make research papers free to read on a national scale. The move follows the ministry’s announcement in February that researchers who receive government funding will be required to make their papers freely available to read on the institutional repositories from April 2025.

The Japanese plan “is expected to enhance the long-term traceability of research information, facilitate secondary research and promote collaboration”, says Kazuki Ide, a health-sciences and public-policy scholar at Osaka University in Suita, Japan, who has written about open access in Japan.

The nation is one of the first Asian countries to make notable advances towards making more research open access (OA) and among the first countries in the world to forge a nationwide plan for OA.

The plan follows in the footsteps of the influential Plan S, introduced six years ago by a group of research funders in the United States and Europe known as cOAlition S, to accelerate the move to OA publishing. The United States also implemented an OA mandate in 2022 that requires all research funded by US taxpayers to be freely available from 2026…(More)”.

The not-so-silent type: Vulnerabilities across keyboard apps reveal keystrokes to network eavesdroppers


Report by Jeffrey KnockelMona Wang, and Zoë Reichert: “Typing logographic languages such as Chinese is more difficult than typing alphabetic languages, where each letter can be represented by one key. There is no way to fit the tens of thousands of Chinese characters that exist onto a single keyboard. Despite this obvious challenge, technologies have developed which make typing in Chinese possible. To enable the input of Chinese characters, a writer will generally use a keyboard app with an “Input Method Editor” (IME). IMEs offer a variety of approaches to inputting Chinese characters, including via handwriting, voice, and optical character recognition (OCR). One popular phonetic input method is Zhuyin, and shape or stroke-based input methods such as Cangjie or Wubi are commonly used as well. However, used by nearly 76% of mainland Chinese keyboard users, the most popular way of typing in Chinese is the pinyin method, which is based on the pinyin romanization of Chinese characters.

All of the keyboard apps we analyze in this report fall into the category of input method editors (IMEs) that offer pinyin input. These keyboard apps are particularly interesting because they have grown to accommodate the challenge of allowing users to type Chinese characters quickly and easily. While many keyboard apps operate locally, solely within a user’s device, IME-based keyboard apps often have cloud features which enhance their functionality. Because of the complexities of predicting which characters a user may want to type next, especially in logographic languages like Chinese, IMEs often offer “cloud-based” prediction services which reach out over the network. Enabling “cloud-based” features in these apps means that longer strings of syllables that users type will be transmitted to servers elsewhere. As many have previously pointed out, “cloud-based” keyboards and input methods can function as vectors for surveillance and essentially behave as keyloggers. While the content of what users type is traveling from their device to the cloud, it is additionally vulnerable to network attackers if not properly secured. This report is not about how operators of cloud-based IMEs read users’ keystrokes, which is a phenomenon that has already been extensively studied and documented. This report is primarily concerned with the issue of protecting this sensitive data from network eavesdroppers…(More)”.

Disfactory Project: How to Detect Illegal Factories by Open Source Technology and Crowdsourcing


Article by Peii Lai: “…building illegal factories on farmlands is still a profitable business, because the factory owners thus obtain the means of production at a lower price and can easily get away with penalties by simply ignoring their legal responsibility. Such conduct simply shifts the cost of production onto the environment in an irresponsible way. As we can imagine, such violations has been increasing year by year. On average, Taiwan loses 1,500 hectares of farmland each year due to illegal use, which demonstrates that illegal factories are an ongoing and escalating problem that people cannot ignore.

It’s clearly that the problem of illegal factories are caused by dysfunction of the previous land management regulations. In response to that, Citizens of Earth Taiwan (CET) started seeking solutions to tackle the illegal factories. CET soon realized that the biggest obstacle they faced was that no one saw the violations as a big deal. Local governments avoided standing on the opposite side of the illegal factories. For local governments, imposing penalties is an arduous and thankless task…

Through the collaboration of CET and g0v-zero, the Disfactory project combines the knowledge they have accumulated through advocacy and the diverse techniques brought by the passionate civic contributors. In 2020, the Disfactory project team delivered its first product: disfactory.tw. They built a website with geographic information that whistle blowers can operate on the ground by themselves. Through a few simple steps: identifying the location of the target illegal factory, taking a picture of it, uploading the photos, any citizen can easily register the information on Disfactory’s website….(More)”

‘Positive deviance’ and the power of outliers


Bloomberg Cities Network: “Groundbreaking solutions in cities are often the result of visionary mayoral leadership. But sometimes certain communities achieve significantly better outcomes than their similarly resourced neighbors—and the underlying reasons may not be immediately obvious to local leaders. Ravi Gurumurthy, CEO of the global innovation foundation Nesta, believes that this variation in quality of life at a hyper-local level is something worth paying a lot more attention to. 

“The fastest way for us to improve people’s lives will be to mine that variation and really understand what is going on,” he says.    

This concept, known as “positive deviance,” describes individuals or communities that achieve remarkable success or exhibit highly effective behaviors despite facing the same constraints as their peers. With a long history of use in international development, positive deviance is now gaining traction among city leaders as a source of solutions to stubborn urban challenges.  

Here’s a closer look at what it’s about, and how it’s already being used to uplift promising approaches in cities. 

What is positive deviance? 

Positive deviance first gained widespread attention because of a remarkable success story in 1990s Vietnam. Much of the country was suffering from a malnutrition crisis, and efforts to design and implement new solutions were coming up short. But aid workers landed on a breakthrough by paying closer attention to children who already appeared larger and healthier than their peers.  

It turned out these children were being fed different diets—leaning more heavily on shrimp and crab, for example, which were widely accessible but less often fed to young people. These children also were being fed more frequently, in smaller meals, throughout the day—an intervention that, again, did not require parents to have more resources so much as to differently use what was universally available.  

When these practices—feeding kids shellfish and making meals smaller and more frequent—were replicated, malnutrition plummeted…(More)”

Mass Data Sharing in Smart Cities


Report by Berenika Drazewska and Mark Findlay: “There are at least two ways of understanding the importance of this Report and its implications. The essential research purpose was to examine the nature of mass data sharing between private and public agencies in the commerce and administration of certain smart cities. With this knowledge the research speculated on and selectively exposed the governance challenges posed by this sharing for stakeholders, citizen/residents in particular, in various data relationships and arrangements. Predicting that good data governance policy and practices can address these challenges, the Report proposes a model strategy that grows from commitments where stakeholders will employ trusted data spaces to create respectful and responsible data relationships, where the benefits of data sharing can also be achieved without compromising any stakeholder interests…(More)”.