AXRP - the AI X-risk Research Podcast
1) 46 - Tom Davidson on AI-enabled Coups
Could AI enable a small group to gain power over a large country, and lock in their power permanently? Often, people worried about catastrophic risks from AI have been concerned with misalignment risk...Show More
2) 45 - Samuel Albanie on DeepMind's AGI Safety Approach
In this episode, I chat with Samuel Albanie about the Google DeepMind paper he co-authored called "An Approach to Technical AGI Safety and Security". It covers the assumptions made by the approach, as...Show More
3) 44 - Peter Salib on AI Rights for Human Safety
In this episode, I talk with Peter Salib about his paper "AI Rights for Human Safety", arguing that giving AIs the right to contract, hold property, and sue people will reduce the risk of their trying...Show More
4) 43 - David Lindner on Myopic Optimization with Non-myopic Approval
In this episode, I talk with David Lindner about Myopic Optimization with Non-myopic Approval, or MONA, which attempts to address (multi-step) reward hacking by myopically optimizing actions against a...Show More
5) 42 - Owain Evans on LLM Psychology
Earlier this year, the paper "Emergent Misalignment" made the rounds on AI x-risk social media for seemingly showing LLMs generalizing from 'misaligned' training data of insecure code to acting comica...Show More
6) 41 - Lee Sharkey on Attribution-based Parameter Decomposition
What's the next step forward in interpretability? In this episode, I chat with Lee Sharkey about his proposal for detecting computational mechanisms within neural networks: Attribution-based Parameter...Show More
7) 40 - Jason Gross on Compact Proofs and Interpretability
How do we figure out whether interpretability is doing its job? One way is to see if it helps us prove things about models that we care about knowing. In this episode, I speak with Jason Gross about h...Show More
8) 38.8 - David Duvenaud on Sabotage Evaluations and the Post-AGI Future
In this episode, I chat with David Duvenaud about two topics he's been thinking about: firstly, a paper he wrote about evaluating whether or not frontier models can sabotage human decision-making or m...Show More
9) 38.7 - Anthony Aguirre on the Future of Life Institute
The Future of Life Institute is one of the oldest and most prominant organizations in the AI existential safety space, working on such topics as the AI pause open letter and how the EU AI Act can be i...Show More
10) 38.6 - Joel Lehman on Positive Visions of AI
Typically this podcast talks about how to avert destruction from AI. But what would it take to ensure AI promotes human flourishing as well as it can? Is alignment to individuals enough, and if not, w...Show More