Statistical Significance—and Why It Matters for Parenting


Blog by Emily Oster: “…When we say an effect is “statistically significant at the 5% level,” what this means is that there is less than a 5% chance that we’d see an effect of this size if the true effect were zero. (The “5% level” is a common cutoff, but things can be significant at the 1% or 10% level also.) 

The natural follow-up question is: Why would any effect we see occur by chance? The answer lies in the fact that data is “noisy”: it comes with error. To see this a bit more, we can think about what would happen if we studied a setting where we know our true effect is zero. 

My fake study 

Imagine the following (fake) study. Participants are randomly assigned to eat a package of either blue or green M&Ms, and then they flip a (fair) coin and you see if it is heads. Your analysis will compare the number of heads that people flip after eating blue versus green M&Ms and report whether this is “statistically significant at the 5% level.”…(More)”.

External Researcher Access to Closed Foundation Models


Report by Esme Harrington and Dr. Mathias Vermeulen: “…addresses a pressing issue: independent researchers need better conditions for accessing and studying the AI models that big companies have developed. Foundation models — the core technology behind many AI applications — are controlled mainly by a few major players who decide who can study or use them.

What’s the problem with access?

  • Limited access: Companies like OpenAI, Google and others are the gatekeepers. They often restrict access to researchers whose work aligns with their priorities, which means independent, public-interest research can be left out in the cold.
  • High-end costs: Even when access is granted, it often comes with a hefty price tag that smaller or less-funded teams can’t afford.
  • Lack of transparency: These companies don’t always share how their models are updated or moderated, making it nearly impossible for researchers to replicate studies or fully understand the technology.
  • Legal risks: When researchers try to scrutinize these models, they sometimes face legal threats if their work uncovers flaws or vulnerabilities in the AI systems.

The research suggests that companies need to offer more affordable and transparent access to improve AI research. Additionally, governments should provide legal protections for researchers, especially when they are acting in the public interest by investigating potential risks…(More)”.

Key lesson of this year’s Nobel Prize: The importance of unlocking data responsibly to advance science and improve people’s lives


Article by Stefaan Verhulst, Anna Colom, and Marta Poblet: “This year’s Nobel Prize for Chemistry owes a lot to available, standardised, high quality data that can be reused to improve people’s lives. The winners, Prof David Baker from the University of Washington, and Demis Hassabis and John M. Jumper from Google DeepMind, were awarded respectively for the development and prediction of new proteins that can have important medical applications. These developments build on AI models that can predict protein structures in unprecedented ways. However, key to these models and their potential to unlock health discoveries is an open curated dataset with high quality and standardised data, something still rare despite the pace and scale of AI-driven development.

We live in a paradoxical time of both data abundance and data scarcity: a lot of data is being created and stored, but it tends to be inaccessible due to private interests and weak regulations. The challenge, then, is to prevent the misuse of data whilst avoiding its missed use.

The reuse of data remains limited in Europe, but a new set of regulations seeks to increase the possibilities of responsible data reuse. When the European Commission made the case for its European Data Strategy in 2020, it envisaged the European Union “a role model for a society empowered by data to make better decisions — in business and the public sector,” and acknowledged the need to improve “governance structures for handling data and to increase its pools of quality data available for use and reuse”…(More)”.

WikiProject AI Cleanup


Article by Emanuel Maiberg: “A group of Wikipedia editors have formed WikiProject AI Cleanup, “a collaboration to combat the increasing problem of unsourced, poorly-written AI-generated content on Wikipedia.”

The group’s goal is to protect one of the world’s largest repositories of information from the same kind of misleading AI-generated information that has plagued Google search resultsbooks sold on Amazon, and academic journals.

“A few of us had noticed the prevalence of unnatural writing that showed clear signs of being AI-generated, and we managed to replicate similar ‘styles’ using ChatGPT,” Ilyas Lebleu, a founding member of WikiProject AI Cleanup, told me in an email. “Discovering some common AI catchphrases allowed us to quickly spot some of the most egregious examples of generated articles, which we quickly wanted to formalize into an organized project to compile our findings and techniques.”…(More)”.

Data’s Role in Unlocking Scientific Potential


Report by the Special Competitive Studies Project: “…we outline two actionable steps the U.S. government can take immediately to address the data sharing challenges hindering scientific research.

1. Create Comprehensive Data Inventories Across Scientific Domains

We recommend the Secretary of Commerce, acting through the Department of Commerce’s Chief Data Officer and the Director of the National Institute of Standards and Technology (NIST), and with the Federal Chief Data Officer Council (CDO Council) create a government-led inventory where organizations – universities, industries, and research institutes – can catalog their datasets with key details like purpose, description, and accreditation. Similar to platforms like data.gov, this centralized repository would make high-quality data more visible and accessible, promoting scientific collaboration. To boost participation, the government could offer incentives, such as grants or citation credits for researchers whose data is used. Contributing organizations would also be responsible for regularly updating their entries, ensuring the data stays relevant and searchable. 

2. Create Scientific Data Sharing Public-Private Partnerships

A critical recommendation of the National Data Action Plan was for the United States to facilitate the creation of data sharing public-private partnerships for specific sectors. The U.S. Government should coordinate data sharing partnerships with its departments and agencies, industry, academia, and civil society. Data collected by one entity can be tremendously valuable to others. But incentivizing data sharing is challenging as privacy, security, legal (e.g., liability), and intellectual property (IP) concerns can limit willingness to share. However, narrowly-scoped PPPs can help overcome these barriers, allowing for greater data sharing and mutually beneficial data use…(More)”

AI-accelerated Nazca survey nearly doubles the number of known figurative geoglyphs and sheds light on their purpose


Paper by Masato Sakai, Akihisa Sakurai, Siyuan Lu, and Marcus Freitag: “It took nearly a century to discover a total of 430 figurative Nazca geoglyphs, which offer significant insights into the ancient cultures at the Nazca Pampa. Here, we report the deployment of an AI system to the entire Nazca region, a UNESCO World Heritage site, leading to the discovery of 303 new figurative geoglyphs within only 6 mo of field survey, nearly doubling the number of known figurative geoglyphs. Even with limited training examples, the developed AI approach is demonstrated to be effective in detecting the smaller relief-type geoglyphs, which unlike the giant line-type geoglyphs are very difficult to discern. The improved account of figurative geoglyphs enables us to analyze their motifs and distribution across the Nazca Pampa. We find that relief-type geoglyphs depict mainly human motifs or motifs of things modified by humans, such as domesticated animals and decapitated heads (81.6%). They are typically located within viewing distance (on average 43 m) of ancient trails that crisscross the Nazca Pampa and were most likely built and viewed at the individual or small-group level. On the other hand, the giant line-type figurative geoglyphs mainly depict wild animals (64%). They are found an average of 34 m from the elaborate linear/trapezoidal network of geoglyphs, which suggests that they were probably built and used on a community level for ritual activities…(More)”

Buried Academic Treasures


Barrett and Greene: “…one of the presenters who said: “We have lots of research that leads to no results.”

As some of you know, we’ve written a book with Don Kettl to help academically trained researchers write in a way that would be understandable by decision makers who could make use of their findings. But the keys to writing well are only a small part of the picture. Elected and appointed officials have the capacity to ignore nearly anything, no matter how well written it is.

This is more than just a frustration to researchers, it’s a gigantic loss to the world of public administration. We spend lots of time reading through reports and frequently come across nuggets of insights that we believe could help make improvements in nearly every public sector endeavor from human resources to budgeting to performance management to procurement and on and on. We, and others, can do our best to get attention for this kind of information, but that doesn’t mean that the decision makers have the time or the inclination to take steps toward taking advantage of great ideas.

We don’t want to place the blame for the disconnect between academia and practitioners on either party. To one degree or the other they’re both at fault, with taxpayers and the people who rely on government services – and that’s pretty much everybody except for people who have gone off the grid – as the losers.

Following, from our experience, are six reasons we believe that it’s difficult to close the gap between the world of research and the realm of utility. The first three are aimed at government leaders, the last three have academics in mind…(More)”

Science Diplomacy and the Rise of Technopoles


Article by Vaughan Turekian and Peter Gluckman: “…Science diplomacy has an important, even existential imperative to help the world reconsider the necessity of working together toward big global goals. Climate change may be the most obvious example of where global action is needed, but many other issues have similar characteristics—deep ocean resources, space, and other ungoverned areas, to name a few.

However, taking up this mantle requires acknowledging why past efforts have failed to meet their goals. The global commitment to Sustainable Development Goals (SDGs) is an example. Weaknesses in the UN system, compounded by varied commitments from member states, will prevent the achievement of the SDGs by 2030. This year’s UN Summit of the Future is intended to reboot the global commitment to the sustainability agenda. Regardless of what type of agreement is signed at the summit, its impact may be limited.  

Science diplomacy has an important, even existential imperative to help the world reconsider the necessity of working together toward big global goals.

The science community must play an active part in ensuring progress is in fact made, but that will require an expansion of the community’s current role. To understand what this might mean, consider that the Pact for the Future agreed in New York City in September 2024 places “science, technology, and innovation” as one of its five themes. But that becomes actionable either in the narrow sense that technology will provide “answers” to global problems or in the platitudinous sense that science provides advice that is not acted upon. This dichotomy of unacceptable approaches has long bedeviled science’s influence.

For the world to make better use of science, science must take on an expanded responsibility in solving problems at both global and local scales. And science itself must become part of a toolkit—both at the practical and the diplomatic level—to address the sorts of challenges the world will face in the future. To make this happen, more countries must make science diplomacy a core part of their agenda by embedding science advisors within foreign ministries, connecting diplomats to science communities.

As the pace of technological change generates both existential risk and economic, environmental, and social opportunities, science diplomacy has a vital task in balancing outcomes for the benefit of more people. It can also bring the science community (including the social sciences and humanities) to play a critical role alongside nation states. And, as new technological developments enable nonstate actors, and especially the private sector, science diplomacy has an important role to play in helping nation states develop policy that can identify common solutions and engage key partners…(More)”.

From Bits to Biology: A New Era of Biological Renaissance powered by AI


Article by Milad Alucozai: “…A new wave of platforms is emerging to address these limitations. Designed with the modern scientist in mind, these platforms prioritize intuitive interfaces, enabling researchers with diverse computational backgrounds to easily navigate and analyze data. They emphasize collaboration, allowing teams to share data and insights seamlessly. And they increasingly incorporate artificial intelligence, offering powerful tools for accelerating analysis and discovery. This shift marks a move towards more user-centric, efficient, and collaborative computational biology, empowering researchers to tackle increasingly complex biological questions. 

Emerging Platforms: 

  • Seqera Labs: Spearheading a movement towards efficient and reproducible research, Seqera Labs provides a suite of tools, including the popular open-source workflow language Nextflow. Their platform empowers researchers to design scalable and reproducible data analysis pipelines, particularly for cloud environments. Seqera streamlines complex computational workflows across diverse biological disciplines by emphasizing automation and flexibility, making data-intensive research scalable, flexible, and collaborative. 
  • Form Bio: Aimed at democratizing access to computational biology, Form Bio provides a comprehensive tech suite built to enable accelerated cell and gene therapy development and computational biology at scale. Its emphasis on collaboration and intuitive design fosters a more inclusive research environment to help organizations streamline therapeutic development and reduce time-to-market.  
  • Code Ocean: Addressing the critical need for reproducibility in research, Code Ocean provides a unique platform for sharing and executing research code, data, and computational environments. By encapsulating these elements in a portable and reproducible format, Code Ocean promotes transparency and facilitates the reuse of research methods, ultimately accelerating scientific discovery. 
  • Pluto Biosciences: Championing a collaborative approach to biological discovery, Pluto Biosciences offers an interactive platform for visualizing and analyzing complex biological data. Its intuitive tools empower researchers to explore data, generate insights, and seamlessly share findings with collaborators. This fosters a more dynamic and interactive research process, facilitating knowledge sharing and accelerating breakthroughs. 

 Open Source Platform: 

  • Galaxy: A widely used open-source platform for bioinformatics analysis. It provides a user-friendly web interface and a vast collection of tools for various tasks, from sequence analysis to data visualization. Its open-source nature fosters community development and customization, making it a versatile tool for diverse research needs. 
  • Bioconductor is a prominent open-source platform for bioinformatics analysis, akin to Galaxy’s commitment to accessibility and community-driven development. It leverages the power of the R programming language, providing a wealth of packages for tasks ranging from genomic data analysis to statistical modeling. Its open-source nature fosters a collaborative environment where researchers can freely access, utilize, and contribute to a growing collection of tools…(More)”

Mapmatics: A Mathematician’s Guide to Navigating the World


Book by Paulina Rowińska: “Why are coastlines and borders so difficult to measure? How does a UPS driver deliver hundreds of packages in a single day? And where do elusive serial killers hide? The answers lie in the crucial connection between maps and math.

In Mapmatics, mathematician Paulina Rowińska leads us on a riveting journey around the globe to discover how maps and math are deeply entwined, and always have been. From a sixteenth-century map, an indispensable navigation tool that exaggerates the size of northern countries, to public transport maps that both guide and confound passengers, to congressional maps that can empower or silence whole communities, she reveals how maps and math have shaped not only our sense of space but our worldview. In her hands, we learn how to read maps like a mathematician—to extract richer information and, just as importantly, to question our conclusions by asking what we don’t see…(More)”.