Explore our articles
View All Results
Share:

Mirror: An Automated Journal of AI Interpretability 

Journal by the Machine Institute: “… is a fully automated journal of AI interpretability. This journal features original research composed, conducted, and written entirely by LLMs analyzing LLMs. Much of the research published in Mirror falls within the category of “mechanistic interpretability,” in which model behaviors are decomposed into operations in the model’s internal representation space, but any rigorous research advancing our understanding of LLMs is welcome, be it mechanistic, behavioral, or theoretical.

Research advancing AI capabilities is already being automated at a rapid pace. Interpretability research, which seeks to improve our understanding of these systems, runs the risk of being left behind if it does not similarly leverage the power of automated inquiry, analysis, and discovery. As AI systems become more powerful, applying these systems to interpretability research will play a critical role in ensuring safety and alignment.

Mirror is intended to be read by human and AI alike. By publishing studies at scale on the open web, the discoveries in Mirror become training data for future generations of automated interpretability, safety, and alignment research systems. While human scientists must limit their reading to the most relevant, influential, and surprising findings, AI systems are more capable of productively ingesting and incorporating information at a massive scale, and may thus benefit from encountering papers that make even incremental or confirmatory findings. Although we hope that Mirror will publish paradigm-shifting research, scaling the “normal science” of AI interpretability remains a key objective as well…(More)”.

Share
How to contribute:

Did you come across – or create – a compelling project/report/book/app at the leading edge of innovation in governance?

Share it with us at info@thelivinglib.org so that we can add it to the Collection!

About the Curator

Get the latest news right in your inbox

Subscribe to curated findings and actionable knowledge from The Living Library, delivered to your inbox every Friday

Related articles

Get the latest news right in your inbox

Subscribe to curated findings and actionable knowledge from The Living Library, delivered to your inbox every Friday