TalkRL: The Reinforcement Learning Podcast – Détails, épisodes et analyse
Détails du podcast
Informations techniques et générales issues du flux RSS du podcast.

TalkRL: The Reinforcement Learning Podcast
Robin Ranjit Singh Chauhan
Fréquence : 1 épisode/31j. Total Éps: 66

Classements récents
Dernières positions dans les classements Apple Podcasts et Spotify.
Apple Podcasts
🇫🇷 France - technology
22/06/2025#99🇨🇦 Canada - technology
26/12/2024#88
Spotify
Aucun classement récent disponible
Liens partagés entre épisodes et podcasts
Liens présents dans les descriptions d'épisodes et autres podcasts les utilisant également.
See all- https://rohinshah.com/alignment-newsletter/
147 partages
- https://scholar.google.com/citations?hl=en&
106 partages
- https://waymo.com/
50 partages
- https://github.com/google/dopamine
2 partages
- https://github.com/tensorflow/agents
2 partages
- https://github.com/Kaixhin/rlenvs
2 partages
Qualité et score du flux RSS
Évaluation technique de la qualité et de la structure du flux RSS.
See allScore global : 62%
Historique des publications
Répartition mensuelle des publications d'épisodes au fil des années.
Neurips 2024 RL meetup Hot takes: What sucks about RL?
Épisode 61
lundi 23 décembre 2024 • Durée 17:45
What do RL researchers complain about after hours at the bar? In this "Hot takes" episode, we find out!
Recorded at The Pearl in downtown Vancouver, during the RL meetup after a day of Neurips 2024.
Special thanks to "David Beckham" for the inspiration :)
RLC 2024 - Posters and Hallways 5
Épisode 60
vendredi 20 septembre 2024 • Durée 13:17
Posters and Hallway episodes are short interviews and poster summaries. Recorded at RLC 2024 in Amherst MA.
Featuring:
- 0:01 David Radke of the Chicago Blackhawks NHL on RL for professional sports
- 0:56 Abhishek Naik from the National Research Council on Continuing RL and Average Reward
- 2:42 Daphne Cornelisse from NYU on Autonomous Driving and Multi-Agent RL
- 08:58 Shray Bansal from Georgia Tech on Cognitive Bias for Human AI Ad hoc Teamwork
- 10:21 Claas Voelcker from University of Toronto on Can we hop in general?
- 11:23 Brent Venable from The Institute for Human & Machine Cognition on Cooperative information dissemination
Arash Ahmadian on Rethinking RLHF
Épisode 51
lundi 25 mars 2024 • Durée 33:30
Arash Ahmadian is a Researcher at Cohere and Cohere For AI focussed on Preference Training of large language models. He’s also a researcher at the Vector Institute of AI.
Featured Reference
Back to Basics: Revisiting REINFORCE Style Optimization for Learning from Human Feedback in LLMs
Arash Ahmadian, Chris Cremer, Matthias Gallé, Marzieh Fadaee, Julia Kreutzer, Olivier Pietquin, Ahmet Üstün, Sara Hooker
Additional References
- Self-Rewarding Language Models, Yuan et al 2024
- Reinforcement Learning: An Introduction, Sutton and Barto 1992
- Learning from Delayed Rewards, Chris Watkins 1989
- Simple Statistical Gradient-Following Algorithms for Connectionist Reinforcement Learning, Williams 1992
Glen Berseth on RL Conference
Épisode 50
lundi 11 mars 2024 • Durée 21:38
Glen Berseth is an assistant professor at the Université de Montréal, a core academic member of the Mila - Quebec AI Institute, a Canada CIFAR AI chair, member l'Institute Courtios, and co-director of the Robotics and Embodied AI Lab (REAL).
Featured Links
Reinforcement Learning Conference
Closing the Gap between TD Learning and Supervised Learning--A Generalisation Point of View
Raj Ghugare, Matthieu Geist, Glen Berseth, Benjamin Eysenbach
Ian Osband
Épisode 49
jeudi 7 mars 2024 • Durée 01:08:26
Ian Osband is a Research scientist at OpenAI (ex DeepMind, Stanford) working on decision making under uncertainty.
We spoke about:
- Information theory and RL
- Exploration, epistemic uncertainty and joint predictions
- Epistemic Neural Networks and scaling to LLMs
Featured References
Reinforcement Learning, Bit by Bit
Xiuyuan Lu, Benjamin Van Roy, Vikranth Dwaracherla, Morteza Ibrahimi, Ian Osband, Zheng Wen
From Predictions to Decisions: The Importance of Joint Predictive Distributions
Zheng Wen, Ian Osband, Chao Qin, Xiuyuan Lu, Morteza Ibrahimi, Vikranth Dwaracherla, Mohammad Asghari, Benjamin Van Roy
Ian Osband, Zheng Wen, Seyed Mohammad Asghari, Vikranth Dwaracherla, Morteza Ibrahimi, Xiuyuan Lu, Benjamin Van Roy
Approximate Thompson Sampling via Epistemic Neural Networks
Ian Osband, Zheng Wen, Seyed Mohammad Asghari, Vikranth Dwaracherla, Morteza Ibrahimi, Xiuyuan Lu, Benjamin Van Roy
Additional References
- Thesis defence, Ian Osband
- Homepage, Ian Osband
- Epistemic Neural Networks at Stanford RL Forum
- Behaviour Suite for Reinforcement Learning, Osband et al 2019
- Efficient Exploration for LLMs, Dwaracherla et al 2024
Sharath Chandra Raparthy
Épisode 48
lundi 12 février 2024 • Durée 40:41
Sharath Chandra Raparthy on In-Context Learning for Sequential Decision Tasks, GFlowNets, and more!
Sharath Chandra Raparthy is an AI Resident at FAIR at Meta, and did his Master's at Mila.
Featured Reference
Generalization to New Sequential Decision Making Tasks with In-Context Learning
Sharath Chandra Raparthy , Eric Hambro, Robert Kirk , Mikael Henaff, , Roberta Raileanu
Additional References
- Sharath Chandra Raparthy Homepage
- Human-Timescale Adaptation in an Open-Ended Task Space, Adaptive Agent Team 2023
- Data Distributional Properties Drive Emergent In-Context Learning in Transformers, Chan et al 2022
- Decision Transformer: Reinforcement Learning via Sequence Modeling, Chen et al 2021
Pierluca D'Oro and Martin Klissarov
Épisode 47
lundi 13 novembre 2023 • Durée 57:24
Pierluca D'Oro and Martin Klissarov on Motif and RLAIF, Noisy Neighborhoods and Return Landscapes, and more!
Pierluca D'Oro is PhD student at Mila and visiting researcher at Meta.
Martin Klissarov is a PhD student at Mila and McGill and research scientist intern at Meta.
Featured References
Motif: Intrinsic Motivation from Artificial Intelligence Feedback
Martin Klissarov*, Pierluca D'Oro*, Shagun Sodhani, Roberta Raileanu, Pierre-Luc Bacon, Pascal Vincent, Amy Zhang, Mikael Henaff
Policy Optimization in a Noisy Neighborhood: On Return Landscapes in Continuous Control
Nate Rahn*, Pierluca D'Oro*, Harley Wiltzer, Pierre-Luc Bacon, Marc G. Bellemare
To keep doing RL research, stop calling yourself an RL researcher
Pierluca D'Oro
Martin Riedmiller
Épisode 46
mardi 22 août 2023 • Durée 01:13:56
Martin Riedmiller of Google DeepMind on controlling nuclear fusion plasma in a tokamak with RL, the original Deep Q-Network, Neural Fitted Q-Iteration, Collect and Infer, AGI for control systems, and tons more!
Martin Riedmiller is a research scientist and team lead at DeepMind.
Featured References
Magnetic control of tokamak plasmas through deep reinforcement learning
Jonas Degrave, Federico Felici, Jonas Buchli, Michael Neunert, Brendan Tracey, Francesco Carpanese, Timo Ewalds, Roland Hafner, Abbas Abdolmaleki, Diego de las Casas, Craig Donner, Leslie Fritz, Cristian Galperti, Andrea Huber, James Keeling, Maria Tsimpoukelli, Jackie Kay, Antoine Merle, Jean-Marc Moret, Seb Noury, Federico Pesamosca, David Pfau, Olivier Sauter, Cristian Sommariva, Stefano Coda, Basil Duval, Ambrogio Fasoli, Pushmeet Kohli, Koray Kavukcuoglu, Demis Hassabis & Martin Riedmiller
Human-level control through deep reinforcement learning
Volodymyr Mnih, Koray Kavukcuoglu, David Silver, Andrei A Rusu, Joel Veness, Marc G Bellemare, Alex Graves, Martin Riedmiller, Andreas K Fidjeland, Georg Ostrovski, Stig Petersen, Charles Beattie, Amir Sadik, Ioannis Antonoglou, Helen King, Dharshan Kumaran, Daan Wierstra, Shane Legg, Demis Hassabis
Neural fitted Q iteration–first experiences with a data efficient neural reinforcement learning method
Martin Riedmiller
Max Schwarzer
Épisode 45
mardi 8 août 2023 • Durée 01:10:18
Max Schwarzer is a PhD student at Mila, with Aaron Courville and Marc Bellemare, interested in RL scaling, representation learning for RL, and RL for science. Max spent the last 1.5 years at Google Brain/DeepMind, and is now at Apple Machine Learning Research.
Featured References
Bigger, Better, Faster: Human-level Atari with human-level efficiency
Max Schwarzer, Johan Obando-Ceron, Aaron Courville, Marc Bellemare, Rishabh Agarwal, Pablo Samuel Castro
Sample-Efficient Reinforcement Learning by Breaking the Replay Ratio Barrier
Pierluca D'Oro, Max Schwarzer, Evgenii Nikishin, Pierre-Luc Bacon, Marc G Bellemare, Aaron Courville
The Primacy Bias in Deep Reinforcement Learning
Evgenii Nikishin, Max Schwarzer, Pierluca D'Oro, Pierre-Luc Bacon, Aaron Courville
Additional References
- Rainbow: Combining Improvements in Deep Reinforcement Learning, Hessel et al 2017
- When to use parametric models in reinforcement learning? Hasselt et al 2019
- Data-Efficient Reinforcement Learning with Self-Predictive Representations, Schwarzer et al 2020
- Pretraining Representations for Data-Efficient Reinforcement Learning, Schwarzer et al 2021
Julian Togelius
Épisode 44
mardi 25 juillet 2023 • Durée 40:04
Julian Togelius is an Associate Professor of Computer Science and Engineering at NYU, and Cofounder and research director at modl.ai
Featured References
Choose Your Weapon: Survival Strategies for Depressed AI Academics
Julian Togelius, Georgios N. Yannakakis
Learning Controllable 3D Level Generators
Zehua Jiang, Sam Earle, Michael Cerny Green, Julian Togelius
PCGRL: Procedural Content Generation via Reinforcement Learning
Ahmed Khalifa, Philip Bontrager, Sam Earle, Julian Togelius
Illuminating Generalization in Deep Reinforcement Learning through Procedural Level Generation
Niels Justesen, Ruben Rodriguez Torrado, Philip Bontrager, Ahmed Khalifa, Julian Togelius, Sebastian Risi