Blog by Emily Oster: “…When we say an effect is “statistically significant at the 5% level,” what this means is that there is less than a 5% chance that we’d see an effect of this size if the true effect were zero. (The “5% level” is a common cutoff, but things can be significant at the 1% or 10% level also.)
The natural follow-up question is: Why would any effect we see occur by chance? The answer lies in the fact that data is “noisy”: it comes with error. To see this a bit more, we can think about what would happen if we studied a setting where we know our true effect is zero.
My fake study
Imagine the following (fake) study. Participants are randomly assigned to eat a package of either blue or green M&Ms, and then they flip a (fair) coin and you see if it is heads. Your analysis will compare the number of heads that people flip after eating blue versus green M&Ms and report whether this is “statistically significant at the 5% level.”…(More)”.
Article by Stefaan Verhulst, Anna Colom, and Marta Poblet: “This year’s Nobel Prize for Chemistry owes a lot to available, standardised, high quality data that can be reused to improve people’s lives. The winners, Prof David Baker from the University of Washington, and Demis Hassabis and John M. Jumper from Google DeepMind, were awarded respectively for the development and prediction of new proteins that can have important medical applications. These developments build on AI models that can predict protein structures in unprecedented ways. However, key to these models and their potential to unlock health discoveries is an open curated dataset with high quality and standardised data, something still rare despite the pace and scale of AI-driven development.
We live in a paradoxical time of both data abundance and data scarcity: a lot of data is being created and stored, but it tends to be inaccessible due to private interests and weak regulations. The challenge, then, is to prevent the misuse of data whilst avoiding its missed use.
The reuse of data remains limited in Europe, but a new set of regulations seeks to increase the possibilities of responsible data reuse. When the European Commission made the case for its European Data Strategy in 2020, it envisaged the European Union “a role model for a society empowered by data to make better decisions — in business and the public sector,” and acknowledged the need to improve “governance structures for handling data and to increase its pools of quality data available for use and reuse”…(More)”.
Essay by Adam Zable, Marine Ragnet, Roshni Singh, Hannah Chafetz, Andrew J. Zahuranec, and Stefaan G. Verhulst: “In what follows we provide a series of case studies of how AI can be used to promote peace, leveraging what we learned at the Kluz Prize for PeaceTech and NYU Prep and Becera events. These case studies and applications of AI are limited to what was included in these initiatives and are not fully comprehensive. With these examples of the role of technology before, during, and after a conflict, we hope to broaden the discussion around the potential positive uses of AI in the context of today’s global challenges.
The table above summarizes the how AI may be harnessed throughout the conflict cycle and the supporting examples from the Kluz Prize for PeaceTech and NYU PREP and Becera events
(1) The Use of AI Before a Conflict
AI can support conflict prevention by predicting emerging tensions and supporting mediation efforts. In recent years, AI-driven early warning systems have been used to identify patterns that precede violence, allowing for timely interventions.
For instance, The Violence & Impacts Early-Warning System (VIEWS), developed by a research consortium at Uppsala University in Sweden and the Peace Research Institute Oslo (PRIO) in Norway, employs AI and machine learning algorithms to analyze large datasets, including conflict history, political events, and socio-economic indicators—supporting negative peace and peacebuilding efforts. These algorithms are trained to recognize patterns that precede violent conflict, using both supervised and unsupervised learning methods to make predictions about the likelihood and severity of conflicts up to three years in advance. The system also uses predictive analytics to identify potential hotspots, where specific factors—such as spikes in political unrest or economic instability—suggest a higher risk of conflict…(More)”.
Blog by Daro: “There is a problem related to how we effectively help people receiving social services and public benefit programs. It’s a problem that we have been thinking, talking, and writing about for years. It’s a problem that once you see it, you can’t unsee it. It’s also a problem that you’re likely familiar with, whether you have direct experience with the dynamics themselves, or you’ve been frustrated by how these dynamics impact your work. In February, we organized a convening at Georgetown University in collaboration with Georgetown’s Massive Data Institute to discuss how so many of us can be frustrated by the same problem but haven’t been able to really make any headway toward a solution.
For as long as social services have existed, people have been trying to understand how to manage and evaluate those services. How do we determine what to scale and what to change? How do we replicate successes and how do we minimize unsuccessful interventions? To answer these questions we have tried to create, use, and share evidence about these programs to inform our decision-making. However – and this is a big however – despite our collective efforts, we have difficulty determining whether there’s been an increase in using evidence, or most importantly, whether there’s actually been an improvement in the quality and impact of social services and public benefit programs…(More)”.
Article by Stefaan Verhulst and Peter Addo: “…At the root of this debate runs a frequent concern with how data is collected, stored, used — and responsibly reused for other purposes that initially collected for…
In this article, we propose that promoting responsible reuse of data requires addressing the power imbalances inherent in the data ecology. These imbalances disempower key stakeholders, thereby undermining trust in data management practices. As we recently argued in a report on “responsible data reuse in developing countries,” prepared for Agence Française de Development (AFD), power imbalences may be particularly pernicious when considering the use of data in the Global South. Addressing these requires broadening notions of consent, beyond current highly individualized approaches, in favor of what we instead term a social license for reuse.
In what follows, we explain what a social license means, and propose three steps to help achieve that goal. We conclude by calling for a new research agenda — one that would stretch existing disciplinary and conceptual boundaries — to reimagine what social licenses might mean, and how they could be operationalized…(More)”.
Barrett and Greene: “…one of the presenters who said: “We have lots of research that leads to no results.”
As some of you know, we’ve written a book with Don Kettl to help academically trained researchers write in a way that would be understandable by decision makers who could make use of their findings. But the keys to writing well are only a small part of the picture. Elected and appointed officials have the capacity to ignore nearly anything, no matter how well written it is.
This is more than just a frustration to researchers, it’s a gigantic loss to the world of public administration. We spend lots of time reading through reports and frequently come across nuggets of insights that we believe could help make improvements in nearly every public sector endeavor from human resources to budgeting to performance management to procurement and on and on. We, and others, can do our best to get attention for this kind of information, but that doesn’t mean that the decision makers have the time or the inclination to take steps toward taking advantage of great ideas.
We don’t want to place the blame for the disconnect between academia and practitioners on either party. To one degree or the other they’re both at fault, with taxpayers and the people who rely on government services – and that’s pretty much everybody except for people who have gone off the grid – as the losers.
Following, from our experience, are six reasons we believe that it’s difficult to close the gap between the world of research and the realm of utility. The first three are aimed at government leaders, the last three have academics in mind…(More)”
Article by Kevin Frazier: “The data relied on by OpenAI, Google, Meta, and other artificial intelligence (AI) developers is not readily available to other AI labs. Google and Meta relied, in part, on data gathered from their own products to train and fine-tune their models. OpenAI used tactics to acquire data that now would not work or may be more likely to be found in violation of the law (whether such tactics violated the law when originally used by OpenAI is being worked out in the courts). Upstart labs as well as research outfits find themselves with a dearth of data. Full realization of the positive benefits of AI, such as being deployed in costly but publicly useful ways (think tutoring kids or identifying common illnesses), as well as complete identification of the negative possibilities of AI (think perpetuating cultural biases) requires that labs other than the big players have access to quality, sufficient data.
The proper response is not to return to an exploitative status quo. Google, for example, may have relied on data from YouTube videos without meaningful consent from users. OpenAI may have hoovered up copyrighted data with little regard for the legal and social ramifications of that approach. In response to these questionable approaches, data has (rightfully) become harder to acquire. Cloudflare has equipped websites with the tools necessary to limit data scraping—the process of extracting data from another computer program. Regulators have developed new legal limits on data scraping or enforced old ones. Data owners have become more defensive over their content and, in some cases, more litigious. All of these largely positive developments from the perspective of data creators (which is to say, anyone and everyone who uses the internet) diminish the odds of newcomers entering the AI space. The creation of a public AI training data bank is necessary to ensure the availability of enough data for upstart labs and public research entities. Such banks would prevent those new entrants from having to go down the costly and legally questionable path of trying to hoover up as much data as possible…(More)”.
Article by Timothy Taylor: “When most people think of “experiments,” they think of test tubes and telescopes, of Petri dishes and Bunsen burners. But the physical apparatus is not central to what an “experiment” means. Instead, what matters is the ability to specify different conditions–and then to observe how the differences in the underlying conditions alter the outcomes. When “experiments” are understood in this broader way, the application of “experiments” is expanded.
For example, back in 1881 when Louis Pasteur tested his vaccine for sheep anthrax, he gave the vaccine to half of a flock of sheep, expose the entire group to anthrax, and showed that those with the vaccine survived. More recently, the “Green Revolution” in agricultural technology was essentially a set of experiments, by systematically breeding plant varieties and then looking at the outcomes in terms of yield, water use, pest resistance, and the like.
This understanding of “experiment” can be applied in economics, as well. John A. List explains in “Field Experiments: Here Today Gone Tomorrow?” (American Economist, published online August 6, 2024). By “field experiments,” List is seeking to differentiate his topic from “lab experiments,” which for economists refers to experiments carried out in a classroom context, often with students as the subjects, and to focus instead on experiments that involve people in the “field”–that is, in the context of their actual economic activities, including work, selling and buying, charitable giving, and the like. As List points out, these kinds of economic experiments have been going on for decades. He points out that government agencies have been conducting field experiments for decades…(More)”.
Unfortunately, most of the urban data remains in silos and capacities for our cities to harness urban data to improve decision-making, strengthen citizen participation continues to be limited. As per the last Data Maturity Assessment Framework (DMAF) assessment conducted in November 2020 by MoHUA, among 100 smart cities only 45 cities have drafted/ approved their City Data Policies with just 32 cities having a dedicated data budget in 2020–21 for data-related activities. Moreover, in-terms of fostering data collaborations, only 12 cities formed data alliances to achieve tangible outcomes. We hope smart cities continue this practice by conducting a yearly self-assessment to progress in their journey to harness data for improving their urban planning.
Seeding Urban Data Collaborative to advance City-level Data Engagements
There is a need to bring together a diverse set of stakeholders including governments, civil societies, academia, businesses and startups, volunteer groups and more to share and exchange urban data in a secure, standardised and interoperable manner, deriving more value from re-using data for participatory urban development. Along with improving data sharing among these stakeholders, it is necessary to regularly convene, ideate and conduct capacity building sessions and institutionalise data practices.
Urban Data Collaborative can bring together such diverse stakeholders who could address some of these perennial challenges in the ecosystem while spurring innovation…(More)”
By: Roshni Singh, Hannah Chafetz, and Stefaan G. Verhulst
The questions that society asks can transform public policy making, mobilize resources, and shape public discourse, yet decision makers around the world frequently focus on developing solutions rather than identifying the questions that need to be addressed to develop those solutions.
This blog provides a range of resources on the potential of questions for society. It includes readings on new approaches to formulating questions, how questions benefit public policy making and democracy, the importance of increasing the capacity for questioning at the individual level, and the role of questions in the age of AI and prompt engineering.
These readings underscore the need for a new science of questions – a new discipline solely focused on integrating participatory approaches for identifying, prioritizing, and addressing questions for society. This emerging discipline not only fosters creativity and critical thinking within societies but also empowers individuals and communities to engage actively in the questioning process, thereby promoting a more inclusive and equitable approach to addressing today’s societal challenges.
A few key takeaways from these readings:
Incorporating participatory approaches in questioning processes: Several of the readings discuss the value of including participatory approaches in questioning as a means to incorporate diverse perspectives, identify where there knowledge gaps, and ensure the questions prioritized reflect current needs. In particular, the readings emphasize the role of open innovation and co-creation principles, workshops, surveys, as ways to make the questioning process more collaborative.
Advancing individuals’ questioning capability: Teaching individuals to ask their own questions fosters agency and is essential for effective democratic participation. The readings recommend cultivating this skill from early education through adulthood to empower individuals to engage actively in decision-making processes.
Improving questioning processes for responsible AI use: In the era of AI and prompt engineering, how questions are framed is key for deriving meaningful responses to AI queries. More focus on participatory question formulation in the context of AI can help foster more inclusive and responsible data governance.
In “Crowdsourcing Research Questions in Science,” the authors examine how involving the general public in formulating research questions can enhance scientific inquiry. They analyze two crowdsourcing projects in the medical sciences and find that crowd-generated questions often restate problems but provide valuable cross-disciplinary insights. Although these questions typically rank lower in novelty and scientific impact compared to professional questions, they match the practical impact of professional research. The authors argue that crowdsourcing can improve research by offering diverse perspectives. They emphasize the importance of using effective selection methods to identify and prioritize the most valuable contributions from the crowd, ensuring that the highest quality questions are highlighted and addressed.
This journal article emphasizes the growing importance of openness and collaboration in scientific research. The authors identify the lack of a unified understanding of these practices due to differences in disciplinary approaches and propose an Open Innovation in Science (OIS) Research Framework (co-developed with 47 scholars) to bridge these knowledge gaps and synthesize information across fields. The authors argue that integrating Open Science and Open Innovation concepts can enhance researchers’ and practitioners’ understanding of how these practices influence the generation and dissemination of scientific insights and innovation. The article highlights the need for interdisciplinary collaboration to address the complexities of societal, technical, and environmental challenges and provides a foundation for future research, policy discussions, and practical guidance in promoting open and collaborative scientific practices.
In “The Surprising Power of Questions,” published in Harvard Business Review, Alison Wood Brooks and Leslie K. John highlight how asking questions drives learning, innovation, and relationship building within organizations. They argue that many executives focus on answers but underestimate how well-crafted questions can enhance communication, build trust, and uncover risks. Drawing from behavioral science, the authors show how the type, tone, and sequence of questions influence the effectiveness of conversations. By refining their questioning skills, individuals can boost emotional intelligence, foster deeper connections, and unlock valuable insights that benefit both themselves and their organizations.
In “Choosing Policy-Relevant Research Questions,” Paul Kellner explains how social scientists can craft research questions that better inform policy decisions. He highlights the ongoing issue of social sciences not significantly impacting policy, as noted by experts like William Julius Wilson and Christopher Whitty. The article suggests methods for engaging policymakers in the research question formulation process, such as user engagement, co creation, surveys, voting, and consensus-building workshops. Kellner provides examples where policymakers directly participated in the research, resulting in more practical and relevant outcomes. He concludes that improving coordination between researchers and policymakers can enhance the policy impact of social science research.
In this Op-Ed, Andrew P. Minigan emphasizes the critical role of curiosity and question formulation in education. He argues that alongside the “4 Cs” (creativity, critical thinking, communication, and collaboration), there should be a fifth C: curiosity. Asking questions enables students to identify knowledge gaps, think critically and creatively, and engage with peers. Research links curiosity to improved memory, academic achievement, and creativity. Despite these benefits, traditional teaching models often overlook curiosity. Minigan suggests teaching students to formulate questions to boost their curiosity and support educational goals. He concludes that nurturing curiosity is essential for developing innovative thinkers who can explore new, complex questions.
In this blog, Dan Rothstein highlights the importance of fostering “agency,” which is the ability of individuals to think and act independently, as a cornerstone of democracy. Rothstein and his colleague Luz Santana have spent over two decades at The Right Question Institute teaching people how to ask their own questions to enhance their participation in decision-making. They discovered that the inability to ask questions hinders involvement in decisions that impact individuals. Rothstein argues that learning to formulate questions is essential for developing agency and effective democratic participation. This skill should be taught from early education through adulthood. Despite its importance, many students do not learn this in college, so educators must focus on teaching question formulation at all levels. Rothstein concludes that empowering individuals to ask questions is vital for a strong democracy and should be a continuous effort across society.
In the chapter “From a Policy Problem to a Research Question: Getting It Right Together” from the Science for Policy Handbook, Marta Sienkiewicz emphasizes the importance of co-creation between researchers and policymakers to determine relevant research questions. She highlights the need for this approach due to the separation between research and policy cultures, and the differing natures of scientific (tame) and policy (wicked) problems. Sienkiewicz outlines a skills framework and provides examples from the Joint Research Centre (JRC), such as Knowledge Centres, staff exchanges, and collaboration facilitators, to foster interaction and collaboration. Engaging policymakers in the research question development process leads to more practical and relevant outcomes, builds trust, and strengthens relationships. This collaborative approach ensures that research is aligned with policy needs, increases the chances of evidence being used effectively in decision-making, and ultimately enhances the impact of scientific research on policy.
In “Methods for Collaboratively Identifying Research Priorities and Emerging Issues in Science and Policy,” the authors, William J. Sutherland et al., emphasize the importance of bridging the gap between scientific research and policy needs through collaborative approaches. They outline a structured, inclusive methodology that involves researchers, policymakers, and practitioners to jointly identify priority research questions. The approach includes gathering input from diverse stakeholders, iterative voting processes, and structured workshops to refine and prioritize questions, ensuring that the resulting research addresses critical societal and environmental challenges. These methods foster greater collaboration and ensure that scientific research is aligned with the practical needs of policymakers, thereby enhancing the relevance and impact of the research on policy decisions. This approach has been successfully applied in multiple fields, including conservation and agriculture, demonstrating its versatility in addressing both emerging issues and long-term policy priorities.
In this article co-authored with Anil Ananthaswamy, , Stefaan Verhulst emphasizes the crucial role of framing questions correctly, particularly in the era of AI and data. They highlight how ChatGPT’s success underscores the power of well-formulated questions and their impact on deriving meaningful answers. Verhulst and Ananthaswamy argue that society’s focus on answers has overshadowed the importance of questioning, which shapes scientific inquiry, public policy, and data utilization. They call for a new science of questions that integrates diverse fields and promotes critical thinking, data literacy, and inclusive questioning to address biases and improve decision-making. This interdisciplinary effort aims to shift the emphasis from merely seeking answers to understanding the context and purpose behind the questions.
In this chapter published in “Global Digital Data Governance: Polycentric Perspectives”, Stefaan Verhulst explores the crucial role of formulating questions in ensuring responsible data usage. Verhulst argues that, in our data-driven society, responsibly handling data is key to maximizing public good and minimizing risks. He proposes a polycentric approach where the right questions are co-defined to enhance the social impact of data science. Drawing from both conceptual and practical knowledge, including his experience with The 100 Questions Initiative, Verhulst emphasizes that a participatory methodology in question formulation can democratize data use, ensuring data minimization, proportionality, participation, and accountability. By shifting from a supply-driven to a demand-driven approach, Verhulst envisions a new “science of questions” that complements data science, fostering a more inclusive and responsible data governance framework.
As we navigate the complexities of our rapidly changing world, the importance of asking the right questions cannot be overstated. We invite researchers, educators, policymakers, and curious minds alike to delve deeper into new approaches for questioning. By fostering an environment that values and prioritizes well-crafted questions, we can drive innovation, enhance education, improve public policy, and harness the potential of AI and data science. In the coming months, The GovLab, with the support of the Henry Luce Foundation, will be exploring these topics further through a series of roundtable discussions. Are you working on participatory approaches to questioning and are interested in getting involved? Email Stefaan G. Verhulst, Co-Founder and Chief R&D at The GovLab, at sverhulst@thegovlab.org.