Using internet search data as part of medical research


Blog by Susan Thomas and Matthew Thompson: “…In the UK, almost 50 million health-related searches are made using Google per year. Globally there are 100s of millions of health-related searches every day. And, of course, people are doing these searches in real-time, looking for answers to their concerns in the moment. It’s also possible that, even if people aren’t noticing and searching about changes to their health, their behaviour is changing. Maybe they are searching more at night because they are having difficulty sleeping or maybe they are spending more (or less) time online. Maybe an individual’s search history could actually be really useful for researchers. This realisation has led medical researchers to start to explore whether individuals’ online search activity could help provide those subtle, almost unnoticeable signals that point to the beginning of a serious illness.

Our recent review found 23 studies have been published so far that have done exactly this. These studies suggest that online search activity among people later diagnosed with a variety of conditions ranging from pancreatic cancer and stroke to mood disorders, was different to people who did not have one of these conditions.

One of these studies was published by researchers at Imperial College London, who used online search activity to identify signals of women with gynaecological malignancies. They found that women with malignant (e.g. ovarian cancer) and benign conditions had different search patterns, up to two months prior to a GP referral. 

Pause for a moment, and think about what this could mean. Ovarian cancer is one of the most devastating cancers women get. It’s desperately hard to detect early – and yet there are signals of this cancer visible in women’s internet searches months before diagnosis?…(More)”.

The Imperial Origins of Big Data


Blog and book by Asheesh Kapur Siddique: “We live in a moment of massive transformation in the nature of information. In 2020, according to one report, users of the Internet created 64.2 zetabytes of data, a quantity greater than the “number of detectable stars in the cosmos,” a colossal increase whose origins can be traced to the emergence of the World Wide Web in 1993.1 Facilitated by technologies like satellites, smartphones, and artificial intelligence, the scale and speed of data creation seems like it may only balloon over the rest of our lifetimes—and with it, the problem of how to govern ourselves in relation to the inequalities and opportunities that the explosion of data creates.

But while much about our era of big data is indeed revolutionary, the political questions that it raises—How should information be used? Who should control it? And how should it be preserved?—are ones with which societies have long grappled. These questions attained a particular importance in Europe from the eleventh century due to a technological change no less significant than the ones we are witnessing today: the introduction of paper into Europe. Initially invented in China, paper travelled to Europe via the conduit of Islam around the eleventh century after the Moors conquered Spain. Over the twelfth, thirteenth, and fourteenth centuries, paper emerged as the fundamental substrate which politicians, merchants, and scholars relied on to record and circulate information in governance, commerce, and learning. At the same time, governing institutions sought to preserve and control the spread of written information through the creation of archives: repositories where they collected, organized, and stored documents.

The expansion of European polities overseas from the late fifteenth century onward saw governments massively scale up their use of paper—and confront the challenge of controlling its dissemination across thousands of miles of ocean and land. These pressures were felt particularly acutely in what eventually became the largest empire in world history, the British empire. As people from the British isles from the early seventeenth century fought, traded, and settled their way to power in the Atlantic world and South Asia, administrators faced the problem of how to govern both their emigrating subjects and the non-British peoples with whom they interacted. This meant collecting information about their behavior through the technology of paper. Just as we struggle to organize, search, and control our email boxes, text messages, and app notifications, so too did these early moderns confront the attendant challenges of developing practices of collection and storage to manage the resulting information overload. And despite the best efforts of states and companies to control information, it constantly escaped their grasp, falling into the hands of their opponents and rivals who deployed it to challenge and contest ruling powers.

The history of the early modern information state offers no simple or straightforward answers to the questions that data raises for us today. But it does remind us of a crucial truth, all too readily obscured by the deluge of popular narratives glorifying technological innovation: that questions of data are inherently questions about politics—about who gets to collect, control, and use information, and the ends to which information should be put. We should resist any effort to insulate data governance from democratic processes—and having an informed perspective on the politics of data requires that we attend not just to its present, but also to its past…(More)”.

The Power of Supercitizens


Blog by Brian Klaas: “Lurking among us, there are a group of hidden heroes, people who routinely devote significant amounts of their time, energy, and talent to making our communities better. These are the devoted, do-gooding, elite one percent. Most, but not all, are volunteers.1 All are selfless altruists. They, the supercitizens, provide some of the stickiness in the social glue that holds us together.2

What if I told you that there’s this little trick you can do that makes your community stronger, helps other people, and makes you happier and live longer? Well, it exists, there’s ample evidence it works, and best of all, it’s free.

Recently published research showcases a convincing causal link between these supercitizens—devoted, regular volunteers—and social cohesion. While such an umbrella term means a million different things, these researchers focused on two UK-based surveys that analyzed three facets of social cohesion, measured through eight questions (respondents answered on a five point scale, ranging from strongly disagree to strongly agree). They were:


Neighboring

  • ‘If I needed advice about something I could go to someone in my neighborhood’;
  • ‘I borrow things and exchange favors with my neighbors’; and
  • ‘I regularly stop and talk with people in my neighborhood’

Psychological sense of community

  • ‘I feel like I belong to this neighborhood’;
  • ‘The friendships and associations I have with other people in my neighborhood mean a lot to me’;
  • ‘I would be willing to work together with others on something to improve my neighborhood’; and
  • ‘I think of myself as similar to the people that live in this neighborhood’)

Attraction to the neighborhood

  • ‘I plan to remain a resident of this neighborhood for a number of years’

While these questions only tap into some specific components of social cohesion, high levels of these ingredients are likely to produce a reliable recipe for a healthy local community. (Social cohesion differs from social capital, popularized by Robert Putnam and his book, Bowling Alone. Social capital tends to focus on links between individuals and groups—are you a joiner or more of a loner?—whereas cohesion refers to a more diffuse sense of community, belonging, and neighborliness)…(More)”.

The Power of Volunteers: Remote Mapping Gaza and Strategies in Conflict Areas


Blog by Jessica Pechmann: “…In Gaza, increased conflict since October 2023 has caused a prolonged humanitarian crisis. Understanding the impact of the conflict on buildings has been challenging, since pre-existing datasets from artificial intelligence and machine learning (AI/ML) models and OSM were not accurate enough to create a full building footprint baseline. The area’s buildings were too dense, and information on the ground was impossible to collect safely. In these hard-to-reach areas, HOT’s remote and crowdsourced mapping methodology was a good fit for collecting detailed information visible on aerial imagery.

In February 2024, after consultation with humanitarian and UN actors working in Gaza, HOT decided to create a pre-conflict dataset of all building footprints in the area in OSM. HOT’s community of OpenStreetMap volunteers did all the data work, coordinating through HOT’s Tasking Manager. The volunteers made meticulous edits to add missing data and to improve existing data. Due to protection and data quality concerns, only expert volunteer teams were assigned to map and validate the area. As in other areas that are hard to reach due to conflict, HOT balanced the data needs with responsible data practices based on the context.

Comparing AI/ML with human-verified OSM building datasets in conflict zones

AI/ML is becoming an increasingly common and quick way to obtain building footprints across large areas. Sources for automated building footprints range from worldwide datasets by Microsoft or Google to smaller-scale open community-managed tools such as HOT’s new application, fAIr.

Now that HOT volunteers have completely updated and validated all OSM buildings in visible imagery pre-conflict, OSM has 18% more individual buildings in the Gaza strip than Microsoft’s ML buildings dataset (estimated 330,079 buildings vs 280,112 buildings). However, in contexts where there has not been a coordinated update effort in OSM, the numbers may differ. For example, in Sudan where there has not been a large organized editing campaign, there are just under 1,500,000 in OSM, compared to over 5,820,000 buildings in Microsoft’s ML data. It is important to note that the ML datasets have not been human-verified and their accuracy is not known. Google Open Buildings has over 26 million building features in Sudan, but on visual inspection, many of these features are noise in the data that the model incorrectly identified as buildings in the uninhabited desert…(More)”.

Under which conditions can civic monitoring be admitted as a source of evidence in courts?


Blog by Anna Berti Suman: “The ‘Sensing for Justice’ (SensJus) research project – running between 2020 and 2023 – explored how people use monitoring technologies or just their senses to gather evidence of environmental issues and claim environmental justice in a variety of fora. Among the other research lines, we looked at successful and failed cases of civic-gathered data introduced in courts. The guiding question was: what are the enabling factors and/or barriers for the introduction of civic evidence in environmental litigation?

Civic environmental monitoring is the use by ordinary people of monitoring devices (e.g., a sensor) or their bare senses (e.g., smell, hearing) to detect environmental issues. It can be regarded as a form of reaction to environmental injustices, a form of political contestation through data and even as a form of collective care. The practice is fast growing, especially thanks to the widespread availability of audio and video-recording devices in the hand of diverse publics, but also due to the increase in public literacy and concern on environmental matters.

Civic monitoring can be a powerful source of evidence for law enforcement, especially when it sheds light on official informational gaps associated with the shortages of public agencies’ resources to detect environmental wrongdoings. Both legal scholars and practitioners as well as civil society organizations and institutional actors should look at the practice and its potential applications with attention.

Among the cases explored for the SensJus project, the Formosa case, Texas, United States, stands out as it sets a key precedent: issued in June 2019, the landmark ruling found a Taiwanese petrochemical company liable for violating the US Clean Water Act, mostly on the basis of citizen-collected evidence involving volunteer observations of plastic contamination over years. The contamination could not be proven through existing data held by competent authorities because the company never filed any record of pollution. Our analysis of the case highlights some key determinants of the case’s success…(More)”.

Future-proofing government data


Article by Amy Jones: “Vast amounts of data are fueling innovation and decision-making, and agencies representing the United States government are custodian to some of the largest repositories of data in the world. As one of the world’s largest data creators and consumers, the federal government has made substantial investments in sourcing, curating, and leveraging data across many domains. However, the increasing reliance on artificial intelligence to extract insights and drive efficiencies necessitates a strategic pivot: agencies must evolve data management practices to identify and discriminate synthetic data from organic sources to safeguard the integrity and utility of data assets.

AI’s transformative potential is contingent on the availability of high-quality data. Data readiness includes attention to quality, accuracy, completeness, consistency, timeliness and relevance, at a minimum, and agencies are adopting robust data governance frameworks that enforce data quality standards at every stage of the data lifecycle. This includes implementing advanced data validation techniques, fostering a culture of data stewardship, and leveraging state-of-the-art tools for continuous data quality monitoring…(More)”.

Rethinking Dual-Use Technology


Article by Artur Kluz and Stefaan Verhulst: “A new concept of “triple use” — where technology serves commercial, defense, and peacebuilding purposes — may offer a breakthrough solution for founders, investors and society to explore….

As a result of the resurgence of geopolitical tensions, the debate about the applications of dual-use technology is intensifying. The core issue founders, tech entrepreneurs, venture capitalists (VCs), and limited partner investors (LPs) are examining is whether commercial technologies should increasingly be re-used for military purposes. Traditionally, the majority of  investors (including limited partners) have prohibited dual-use tech in their agreements. However, the rapidly growing dual-use market, with its substantial addressable size and growth potential, is compelling all stakeholders to reconsider this stance. The pressure for innovations, capital returns and Return On Investment (ROI) is driving the need for a solution. 

These discussions are fraught with moral complexity, but they also present an opportunity to rethink the dual-use paradigm and foster investment in technologies aimed at supporting peace. A new concept of “triple use”— where technology serves commercial, defense, and peacebuilding purposes — may offer an innovative and more positive avenue for founders, investors and society to explore. This additional re-use, which remains in an incipient state, is increasingly being referred to as PeaceTech. By integrating terms dedicated to PeaceTech in new and existing investment and LP agreements, tech companies, founders and venture capital investors can be also required to apply their technology for peacebuilding purposes. This approach can expand the applications of emerging technologies to also include conflict prevention, reconstruction or any humanitarian aspects.

However, current efforts to use technologies for peacebuilding are impeded by various obstacles, including a lack of awareness within the tech sector and among investors, limited commercial interest, disparities in technical capacity, privacy concerns, international relations and political complexities. In the below we examine some of these challenges, while also exploring certain avenues for overcoming them — including approaching technologies for peace as a “triple use” application. We will especially try to identify examples of how tech companies, tech entrepreneurs, accelerators, and tech investors including VCs and LPs can commercially benefit and support “triple use” technologies. Ultimately, we argue, the vast potential — largely untapped — of “triple use” technologies calls for a new wave of tech ecosystem transformation and public and private investments as well as the development of a new field of research…(More)”.

Training LLMs to Draft Replies to Parliamentary Questions


Blog by Watson Chua: “In Singapore, the government is answerable to Parliament and Members of Parliament (MPs) may raise queries to any Minister on any matter in his portfolio. These questions can be answered orally during the Parliament sitting or through a written reply. Regardless of the medium, public servants in the ministries must gather materials to answer the question and prepare a response.

Generative AI and Large Language Models (LLMs) have already been applied to help public servants do this more effectively and efficiently. For example, Pair Search (publicly accessible) and the Hansard Analysis Tool (only accessible to public servants) help public servants search for relevant information in past Parliamentary Sittings relevant to the question and synthesise a response to it.

The existing systems draft the responses using prompt engineering and Retrieval Augmented Generation (RAG). To recap, RAG consists of two main parts:

  • Retriever: A search engine that finds documents relevant to the question
  • Generator: A text generation model (LLM) that takes in the instruction, the question, and the search results from the retriever to respond to the question
A typical RAG system. Illustration by Hrishi Olickel, taken from here.

Using a pre-trained instruction-tuned LLM like GPT-4o, the generator can usually generate a good response. However, it might not be exactly what is desired in terms of verbosity, style and writing prose, and additional human post-processing might be needed. Extensive prompt engineering or few-shot learning can be done to mold the response at the expense of incurring higher costs from using additional tokens in the prompt…(More)”

Increasing The “Policy Readiness” Of Ideas


Article by Tom Kalil: “NASA and the Defense Department have developed an analytical framework called the “technology readiness level” for assessing the maturity of a technology – from basic research to a technology that is ready to be deployed.  

policy entrepreneur (anyone with an idea for a policy solution that will drive positive change) needs to realize that it is also possible to increase the “policy readiness” level of an idea by taking steps to increase the chances that a policy idea is successful, if adopted and implemented.  Given that policy-makers are often time constrained, they are more likely to consider ideas where more thought has been given to the core questions that they may need to answer as part of the policy process.

A good first step is to ask questions about the policy landscape surrounding a particular idea:

1. What is a clear description of the problem or opportunity?  What is the case for policymakers to devote time, energy, and political capital to the problem?

2. Is there a credible rationale for government involvement or policy change?  

Economists have developed frameworks for both market failure (such as public goods, positive and negative externalities, information asymmetries, and monopolies) and government failure (such as regulatory capture, the role of interest groups in supporting policies that have concentrated benefits and diffuse costs, limited state capacity, and the inherent difficulty of aggregating timely, relevant information to make and implement policy decisions.)

3. Is there a root cause analysis of the problem? …(More)”.

AI: a transformative force in maternal healthcare


Article by Afifa Waheed: “Artificial intelligence (AI) and robotics have enormous potential in healthcare and are quickly shifting the landscape – emerging as a transformative force. They offer a new dimension to the way healthcare professionals approach disease diagnosis, treatment and monitoring. AI is being used in healthcare to help diagnose patients, for drug discovery and development, to improve physician-patient communication, to transcribe voluminous medical documents, and to analyse genomics and genetics. Labs are conducting research work faster than ever before, work that otherwise would have taken decades without the assistance of AI. AI-driven research in life sciences has included applications looking to address broad-based areas, such as diabetes, cancer, chronic kidney disease and maternal health.

In addition to increasing the knowledge of access to postnatal and neonatal care, AI can predict the risk of adverse events in antenatal and postnatal women and their neonatal care. It can be trained to identify those at risk of adverse events, by using patients’ health information such as nutrition status, age, existing health conditions and lifestyle factors. 

AI can further be used to improve access to women located in rural areas with a lack of trained professionals – AI-enabled ultrasound can assist front-line workers with image interpretation for a comprehensive set of obstetrics measurements, increasing quality access to early foetal ultrasound scans. The use of AI assistants and chatbots can also improve pregnant mothers’ experience by helping them find available physicians, schedule appointments and even answer some patient questions…

Many healthcare professionals I have spoken to emphasised that pre-existing conditions such as high blood pressure that leads to preeclampsia, iron deficiency, cardiovascular disease, age-related issues for those over 35, various other existing health conditions, and failure in the progress of labour that might lead to Caesarean (C-section), could all cause maternal deaths. Training AI models to detect these diseases early on and accurately for women could prove to be beneficial. AI algorithms can leverage advanced algorithms, machine learning (ML) techniques, and predictive models to enhance decision-making, optimise healthcare delivery, and ultimately improve patient outcomes in foeto-maternal health…(More)”.