science

How and When to Involve Crowds in Scientific Research

Curated on January 21, 2025January 21, 2025 by Stefaan Verhulst

Book by Marion K. Poetz and Henry Sauermann: “This book explores how millions of people can significantly contribute to scientific research with their effort and experience, even if they are not working at scientific institutions and may not have formal scientific training.

Drawing on a strong foundation of scholarship on crowd involvement, this book helps researchers recognize and understand the benefits and challenges of crowd involvement across key stages of the scientific process. Designed as a practical toolkit, it enables scientists to critically assess the potential of crowd participation, determine when it can be most effective, and implement it to achieve meaningful scientific and societal outcomes.

The book also discusses how recent developments in artificial intelligence (AI) shape the role of crowds in scientific research and can enhance the effectiveness of crowd science projects…(More)”

Boosting: Empowering Citizens with Behavioral Science

Curated on January 20, 2025January 23, 2025 by Stefaan Verhulst

Paper by Stefan M. Herzog and Ralph Hertwig: “…Behavioral public policy came to the fore with the introduction of nudging, which aims to steer behavior while maintaining freedom of choice. Responding to critiques of nudging (e.g., that it does not promote agency and relies on benevolent choice architects), other behavioral policy approaches focus on empowering citizens. Here we review boosting, a behavioral policy approach that aims to foster people’s agency, self-control, and ability to make informed decisions. It is grounded in evidence from behavioral science showing that human decision making is not as notoriously flawed as the nudging approach assumes. We argue that addressing the challenges of our time—such as climate change, pandemics, and the threats to liberal democracies and human autonomy posed by digital technologies and choice architectures—calls for fostering capable and engaged citizens as a first line of response to complement slower, systemic approaches…(More)”.

AI for Social Good

Curated on January 14, 2025January 15, 2025 by Stefaan Verhulst

Essay by Iqbal Dhaliwal: “Artificial intelligence (AI) has the potential to transform our lives. Like the internet, it’s a general-purpose technology that spans sectors, is widely accessible, has a low marginal cost of adding users, and is constantly improving. Tech companies are rapidly deploying more capable AI models that are seeping into our personal lives and work.

AI is also swiftly penetrating the social sector. Governments, social enterprises, and NGOs are infusing AI into programs, while public treasuries and donors are working hard to understand where to invest. For example, AI is being deployed to improve health diagnostics, map flood-prone areas for better relief targeting, grade students’ essays to free up teachers’ time for student interaction, assist governments in detecting tax fraud, and enable agricultural extension workers to customize advice.

But the social sector is also rife with examples over the past two decades of technologies touted as silver bullets that fell short of expectations, including One Laptop Per Child, SMS reminders to take medication, and smokeless stoves to reduce indoor air pollution. To avoid a similar fate, AI-infused programs must incorporate insights from years of evidence generated by rigorous impact evaluations and be scaled in an informed way through concurrent evaluations.

Specifically, implementers of such programs must pay attention to three elements. First, they must use research insights on where AI is likely to have the greatest social impact. Decades of research using randomized controlled trials and other exacting empirical work provide us with insights across sectors on where and how AI can play the most effective role in social programs.

Second, they must incorporate research lessons on how to effectively infuse AI into existing social programs. We have decades of research on when and why technologies succeed or fail in the social sector that can help guide AI adopters (governments, social enterprises, NGOs), tech companies, and donors to avoid pitfalls and design effective programs that work in the field.

Third, we must promote the rigorous evaluation of AI in the social sector so that we disseminate trustworthy information about what works and what does not. We must motivate adopters, tech companies, and donors to conduct independent, rigorous, concurrent impact evaluations of promising AI applications across social sectors (including impact on workers themselves); draw insights emerging across multiple studies; and disseminate those insights widely so that the benefits of AI can be maximized and its harms understood and minimized. Taking these steps can also help build trust in AI among social sector players and program participants more broadly…(More)”.

Facing & mitigating common challenges when working with real-world data: The Data Learning Paradigm

Curated on January 13, 2025January 13, 2025 by Stefaan Verhulst

Paper by Jake Lever et al: “The rapid growth of data-driven applications is ubiquitous across virtually all scientific domains, and has led to an increasing demand for effective methods to handle data deficiencies and mitigate the effects of imperfect data. This paper presents a guide for researchers encountering real-world data-driven applications, and the respective challenges associated with this. This article proposes the concept of the Data Learning Paradigm, combining the principles of machine learning, data science and data assimilation to tackle real-world challenges in data-driven applications. Models are a product of the data upon which they are trained, and no data collected from real world scenarios is perfect due to natural limitations of sensing and collection. Thus, computational modelling of real world systems is intrinsically limited by the various deficiencies encountered in real data. The Data Learning Paradigm aims to leverage the strengths of data improvement to enhance the accuracy, reliability, and interpretability of data-driven models. We outline a range of methods which are currently being implemented in the field of Data Learning involving machine learning and data science methods, and discuss how these mitigate the various problems associated with data-driven models, illustrating improved results in a multitude of real world applications. We highlight examples where these methods have led to significant advancements in fields such as environmental monitoring, planetary exploration, healthcare analytics, linguistic analysis, social networks, and smart manufacturing. We offer a guide to how these methods may be implemented to deal with general types of limitations in data, alongside their current and potential applications…(More)”.

Kickstarting Collaborative, AI-Ready Datasets in the Life Sciences with Government-funded Projects

Curated on January 10, 2025January 10, 2025 by Stefaan Verhulst

Article by Erika DeBenedictis, Ben Andrew & Pete Kelly: “In the age of Artificial Intelligence (AI), large high-quality datasets are needed to move the field of life science forward. However, the research community lacks strategies to incentivize collaboration on high-quality data acquisition and sharing. The government should fund collaborative roadmapping, certification, collection, and sharing of large, high-quality datasets in life science. In such a system, nonprofit research organizations engage scientific communities to identify key types of data that would be valuable for building predictive models, and define quality control (QC) and open science standards for collection of that data. Projects are designed to develop automated methods for data collection, certify data providers, and facilitate data collection in consultation with researchers throughout various scientific communities. Hosting of the resulting open data is subsidized as well as protected by security measures. This system would provide crucial incentives for the life science community to identify and amass large, high-quality open datasets that will immensely benefit researchers…(More)”.

The AI tool that can interpret any spreadsheet instantly

Curated on January 8, 2025January 9, 2025 by Stefaan Verhulst

Article by Duncan C. McElfresh: “Say you run a hospital and you want to estimate which patients have the highest risk of deterioration so that your staff can prioritize their care¹. You create a spreadsheet in which there is a row for each patient, and columns for relevant attributes, such as age or blood-oxygen level. The final column records whether the person deteriorated during their stay. You can then fit a mathematical model to these data to estimate an incoming patient’s deterioration risk. This is a classic example of tabular machine learning, a technique that uses tables of data to make inferences. This usually involves developing — and training — a bespoke model for each task. Writing in Nature, Hollmann et al.report a model that can perform tabular machine learning on any data set without being trained specifically to do so.

Tabular machine learning shares a rich history with statistics and data science. Its methods are foundational to modern artificial intelligence (AI) systems, including large language models (LLMs), and its influence cannot be overstated. Indeed, many online experiences are shaped by tabular machine-learning models, which recommend products, generate advertisements and moderate social-media content³. Essential industries such as healthcare and finance are also steadily, if cautiously, moving towards increasing their use of AI.

Despite the field’s maturity, Hollmann and colleagues’ advance could be revolutionary. The authors’ contribution is known as a foundation model, which is a general-purpose model that can be used in a range of settings. You might already have encountered foundation models, perhaps unknowingly, through AI tools, such as ChatGPT and Stable Diffusion. These models enable a single tool to offer varied capabilities, including text translation and image generation. So what does a foundation model for tabular machine learning look like?

Let’s return to the hospital example. With spreadsheet in hand, you choose a machine-learning model (such as a neural network) and train the model with your data, using an algorithm that adjusts the model’s parameters to optimize its predictive performance (Fig. 1a). Typically, you would train several such models before selecting one to use — a labour-intensive process that requires considerable time and expertise. And of course, this process must be repeated for each unique task.

**Figure 1 | A foundation model for tabular machine learning.** a, Conventional machine-learning models are trained on individual data sets using mathematical optimization algorithms. A different model needs to be developed and trained for each task, and for each data set. This practice takes years to learn and requires extensive time and computing resources. b, By contrast, a ‘foundation’ model could be used for any machine-learning task and is pre-trained on the types of data used to train conventional models. This type of model simply reads a data set and can immediately produce inferences about new data points. Hollmann *et al*. developed a foundation model for tabular machine learning, in which inferences are made on the basis of tables of data. Tabular machine learning is used for tasks as varied as social-media moderation and hospital decision-making, so the authors’ advance is expected to have a profound effect in many areas…(More)”

Participatory seascape mapping: A community-based approach to ocean governance and marine conservation

Curated on December 28, 2024December 28, 2024 by Stefaan Verhulst

Paper by Isabel James: “Despite the global proliferation of ocean governance frameworks that feature socioeconomic variables, the inclusion of community needs and local ecological knowledge remains underrepresented. Participatory mapping or Participatory GIS (PGIS) has emerged as a vital method to address this gap by engaging communities that are conventionally excluded from ocean planning and marine conservation. Originally developed for forest management and Indigenous land reclamation, the scholarship on PGIS remains predominantly focused on terrestrial landscapes. This review explores recent research that employs the method in the marine realm, detailing common methodologies, data types and applications in governance and conservation. A typology of ocean-centered PGIS studies was identified, comprising three main categories: fisheries, habitat classification and blue economy activities. Marine Protected Area (MPA) design and conflict management are the most prevalent conservation applications of PGIS. Case studies also demonstrate the method’s effectiveness in identifying critical marine habitats such as fish spawning grounds and monitoring endangered megafauna. Participatory mapping shows particular promise in resource and data limited contexts due to its ability to generate large quantities of relatively reliable, quick and low-cost data. Validation steps, including satellite imagery and ground-truthing, suggest encouraging accuracy of PGIS data, despite potential limitations related to human error and spatial resolution. This review concludes that participatory mapping not only enriches scientific research but also fosters trust and cooperation among stakeholders, ultimately contributing to more resilient and equitable ocean governance…(More)”.

SciAgents: Automating Scientific Discovery Through Bioinspired Multi-Agent Intelligent Graph Reasoning

Curated on December 24, 2024December 24, 2024 by Stefaan Verhulst

Paper by Alireza Ghafarollahi, and Markus J. Buehler: “A key challenge in artificial intelligence (AI) is the creation of systems capable of autonomously advancing scientific understanding by exploring novel domains, identifying complex patterns, and uncovering previously unseen connections in vast scientific data. In this work, SciAgents, an approach that leverages three core concepts is presented: (1) large-scale ontological knowledge graphs to organize and interconnect diverse scientific concepts, (2) a suite of large language models (LLMs) and data retrieval tools, and (3) multi-agent systems with in-situ learning capabilities. Applied to biologically inspired materials, SciAgents reveals hidden interdisciplinary relationships that were previously considered unrelated, achieving a scale, precision, and exploratory power that surpasses human research methods. The framework autonomously generates and refines research hypotheses, elucidating underlying mechanisms, design principles, and unexpected material properties. By integrating these capabilities in a modular fashion, the system yields material discoveries, critiques and improves existing hypotheses, retrieves up-to-date data about existing research, and highlights strengths and limitations. This is achieved by harnessing a “swarm of intelligence” similar to biological systems, providing new avenues for discovery. How this model accelerates the development of advanced materials by unlocking Nature’s design principles, resulting in a new biocomposite with enhanced mechanical properties and improved sustainability through energy-efficient production is shown…(More)”.

Academic writing is getting harder to read—the humanities most of all

Curated on December 23, 2024December 23, 2024 by Stefaan Verhulst

The Economist: “Academics have long been accused of jargon-filled writing that is impossible to understand. A recent cautionary tale was that of Ally Louks, a researcher who set off a social media storm with an innocuous post on X celebrating the completion of her PhD. If it was Ms Louks’s research topic (“olfactory ethics”—the politics of smell) that caught the attention of online critics, it was her verbose thesis abstract that further provoked their ire. In two weeks, the post received more than 21,000 retweets and 100m views.

Although the abuse directed at Ms Louks reeked of misogyny and anti-intellectualism—which she admirably shook off—the reaction was also a backlash against an academic use of language that is removed from normal life. Inaccessible writing is part of the problem. Research has become harder to read, especially in the humanities and social sciences. Though authors may argue that their work is written for expert audiences, much of the general public suspects that some academics use gobbledygook to disguise the fact that they have nothing useful to say. The trend towards more opaque prose hardly allays this suspicion…(More)”.

Once It Has Been Trained, Who Will Own My Digital Twin?

Curated on December 23, 2024December 23, 2024 by Stefaan Verhulst

Article by Todd Carpenter: “Presently, if one ignores the hype around Generative AI systems, we can recognize that software tools are not sentient. Nor can they (yet) overcome the problem of coming up with creative solutions to novel problems. They are limited in what they can do by the training data that they are supplied. They do hold the prospect for making us more efficient and productive, particularly for wrote tasks. But given enough training data, one could consider how much farther this could be taken. In preparation for that future, when it comes to the digital twins, the landscape of the ownership of the intellectual property (IP) behind them is already taking shape.

Several chatbots have been set up to replicate long-dead historical figures so that you can engage with them in their “voice”. Hellohistory is an AI-driven chatbot that provides people the opportunity to, “have in-depth conversations with history’s greatest.” A different tool, Historical Figures Chat, was widely panned not long after its release in 2023, and especially by historians who strongly objected. There are several variations on this theme of varying quality. Of course, with all things GenAI, they will improve over time and many of the obvious and problematic issues will be resolved either by this generation of companies or the next. Whether there is real value and insight to be gained, apart from the novelty, of engaging with “real historical figures” is the multi-billion dollar question. Much like the World Wide Web in the 1990s, very likely there is value, but it will be years before it can be clearly discerned what that value is and how to capitalize upon it. In anticipation of that day, many organizations are positioning themselves to capture that value.

While many universities have taken a very liberal view of ownership of the intellectual property of their students and faculty — far more liberal than many corporations might — others are quite more restrictive…(More)”.