TalkRL: The Reinforcement Learning Podcast – Details, episodes & analysis

Podcast details

Technical and general information from the podcast's RSS feed.

TalkRL: The Reinforcement Learning Podcast

Robin Ranjit Singh Chauhan

Technology

Frequency: 1 episode/31d. Total Eps: 66

TalkRL podcast is All Reinforcement Learning, All the Time. In-depth interviews with brilliant people at the forefront of RL research and practice. Guests from places like MILA, OpenAI, MIT, DeepMind, Berkeley, Amii, Oxford, Google Research, Brown, Waymo, Caltech, and Vector Institute. Hosted by Robin Ranjit Singh Chauhan.

Site

RSS

Apple

Recent rankings

Latest chart positions across Apple Podcasts and Spotify rankings.

Apple Podcasts

🇫🇷 France - technology
22/06/2025
#99
🇨🇦 Canada - technology
26/12/2024
#88

Spotify

No recent rankings available

Shared links between episodes and podcasts

Links found in episode descriptions and other podcasts that share them.

See all

https://scholar.google.com/citations?hl=en&amp
160 shares
https://rohinshah.com/alignment-newsletter/
148 shares
https://waymo.com/
80 shares

https://github.com/google/dopamine
2 shares
https://github.com/tensorflow/agents
2 shares
https://github.com/Kaixhin/rlenvs
2 shares

https://www.youtube.com/watch?v=PBdGge2ipCg
2 shares
https://www.youtube.com/watch?v=akeUVn6WQoU%20%20
1 share
https://www.youtube.com/watch?v=FvQbrE3tyoE
1 share

RSS feed quality and score

Technical evaluation of the podcast's RSS feed quality and structure.

See all

RSS feed quality

To improve

Score global : 62%

Publication history

Monthly episode publishing history over the past years.

Year

Episodes published by month in

Latest published episodes

Recent episodes with titles, durations, and descriptions.

See all

Neurips 2024 RL meetup Hot takes: What sucks about RL?

Episode 61

lundi 23 décembre 2024 • Duration 17:45

What do RL researchers complain about after hours at the bar? In this "Hot takes" episode, we find out!

Recorded at The Pearl in downtown Vancouver, during the RL meetup after a day of Neurips 2024.

Special thanks to "David Beckham" for the inspiration :)

RLC 2024 - Posters and Hallways 5

Episode 60

vendredi 20 septembre 2024 • Duration 13:17

Posters and Hallway episodes are short interviews and poster summaries. Recorded at RLC 2024 in Amherst MA.

Featuring:

0:01 David Radke of the Chicago Blackhawks NHL on RL for professional sports
0:56 Abhishek Naik from the National Research Council on Continuing RL and Average Reward
2:42 Daphne Cornelisse from NYU on Autonomous Driving and Multi-Agent RL
08:58 Shray Bansal from Georgia Tech on Cognitive Bias for Human AI Ad hoc Teamwork
10:21 Claas Voelcker from University of Toronto on Can we hop in general?
11:23 Brent Venable from The Institute for Human & Machine Cognition on Cooperative information dissemination

Arash Ahmadian on Rethinking RLHF

Episode 51

lundi 25 mars 2024 • Duration 33:30

Arash Ahmadian is a Researcher at Cohere and Cohere For AI focussed on Preference Training of large language models. He’s also a researcher at the Vector Institute of AI.

Featured Reference

Back to Basics: Revisiting REINFORCE Style Optimization for Learning from Human Feedback in LLMs

Arash Ahmadian, Chris Cremer, Matthias Gallé, Marzieh Fadaee, Julia Kreutzer, Olivier Pietquin, Ahmet Üstün, Sara Hooker

Additional References

Self-Rewarding Language Models, Yuan et al 2024
Reinforcement Learning: An Introduction, Sutton and Barto 1992
Learning from Delayed Rewards, Chris Watkins 1989
Simple Statistical Gradient-Following Algorithms for Connectionist Reinforcement Learning, Williams 1992

Glen Berseth on RL Conference

Episode 50

lundi 11 mars 2024 • Duration 21:38

Glen Berseth is an assistant professor at the Université de Montréal, a core academic member of the Mila - Quebec AI Institute, a Canada CIFAR AI chair, member l'Institute Courtios, and co-director of the Robotics and Embodied AI Lab (REAL).

Featured Links

Reinforcement Learning Conference

Closing the Gap between TD Learning and Supervised Learning--A Generalisation Point of View
Raj Ghugare, Matthieu Geist, Glen Berseth, Benjamin Eysenbach

Ian Osband

Episode 49

jeudi 7 mars 2024 • Duration 01:08:26

Ian Osband is a Research scientist at OpenAI (ex DeepMind, Stanford) working on decision making under uncertainty.

We spoke about:

- Information theory and RL

- Exploration, epistemic uncertainty and joint predictions

- Epistemic Neural Networks and scaling to LLMs

Featured References

Reinforcement Learning, Bit by Bit
Xiuyuan Lu, Benjamin Van Roy, Vikranth Dwaracherla, Morteza Ibrahimi, Ian Osband, Zheng Wen

From Predictions to Decisions: The Importance of Joint Predictive Distributions

Zheng Wen, Ian Osband, Chao Qin, Xiuyuan Lu, Morteza Ibrahimi, Vikranth Dwaracherla, Mohammad Asghari, Benjamin Van Roy

Epistemic Neural Networks

Ian Osband, Zheng Wen, Seyed Mohammad Asghari, Vikranth Dwaracherla, Morteza Ibrahimi, Xiuyuan Lu, Benjamin Van Roy

Approximate Thompson Sampling via Epistemic Neural Networks

Ian Osband, Zheng Wen, Seyed Mohammad Asghari, Vikranth Dwaracherla, Morteza Ibrahimi, Xiuyuan Lu, Benjamin Van Roy

Additional References

Thesis defence, Ian Osband
Homepage, Ian Osband
Epistemic Neural Networks at Stanford RL Forum
Behaviour Suite for Reinforcement Learning, Osband et al 2019
Efficient Exploration for LLMs, Dwaracherla et al 2024

Sharath Chandra Raparthy

Episode 48

lundi 12 février 2024 • Duration 40:41

Sharath Chandra Raparthy on In-Context Learning for Sequential Decision Tasks, GFlowNets, and more!

Sharath Chandra Raparthy is an AI Resident at FAIR at Meta, and did his Master's at Mila.

Featured Reference

Generalization to New Sequential Decision Making Tasks with In-Context Learning
Sharath Chandra Raparthy , Eric Hambro, Robert Kirk , Mikael Henaff, , Roberta Raileanu

Additional References

Sharath Chandra Raparthy Homepage
Human-Timescale Adaptation in an Open-Ended Task Space, Adaptive Agent Team 2023
Data Distributional Properties Drive Emergent In-Context Learning in Transformers, Chan et al 2022
Decision Transformer: Reinforcement Learning via Sequence Modeling, Chen et al 2021

Pierluca D'Oro and Martin Klissarov

Episode 47

lundi 13 novembre 2023 • Duration 57:24

Pierluca D'Oro and Martin Klissarov on Motif and RLAIF, Noisy Neighborhoods and Return Landscapes, and more!

Pierluca D'Oro is PhD student at Mila and visiting researcher at Meta.

Martin Klissarov is a PhD student at Mila and McGill and research scientist intern at Meta.

Featured References

Motif: Intrinsic Motivation from Artificial Intelligence Feedback
Martin Klissarov*, Pierluca D'Oro*, Shagun Sodhani, Roberta Raileanu, Pierre-Luc Bacon, Pascal Vincent, Amy Zhang, Mikael Henaff

Policy Optimization in a Noisy Neighborhood: On Return Landscapes in Continuous Control
Nate Rahn*, Pierluca D'Oro*, Harley Wiltzer, Pierre-Luc Bacon, Marc G. Bellemare

To keep doing RL research, stop calling yourself an RL researcher
Pierluca D'Oro

Martin Riedmiller

Episode 46

mardi 22 août 2023 • Duration 01:13:56

Martin Riedmiller of Google DeepMind on controlling nuclear fusion plasma in a tokamak with RL, the original Deep Q-Network, Neural Fitted Q-Iteration, Collect and Infer, AGI for control systems, and tons more!

Martin Riedmiller is a research scientist and team lead at DeepMind.

Featured References

Magnetic control of tokamak plasmas through deep reinforcement learning
Jonas Degrave, Federico Felici, Jonas Buchli, Michael Neunert, Brendan Tracey, Francesco Carpanese, Timo Ewalds, Roland Hafner, Abbas Abdolmaleki, Diego de las Casas, Craig Donner, Leslie Fritz, Cristian Galperti, Andrea Huber, James Keeling, Maria Tsimpoukelli, Jackie Kay, Antoine Merle, Jean-Marc Moret, Seb Noury, Federico Pesamosca, David Pfau, Olivier Sauter, Cristian Sommariva, Stefano Coda, Basil Duval, Ambrogio Fasoli, Pushmeet Kohli, Koray Kavukcuoglu, Demis Hassabis & Martin Riedmiller

Human-level control through deep reinforcement learning
Volodymyr Mnih, Koray Kavukcuoglu, David Silver, Andrei A Rusu, Joel Veness, Marc G Bellemare, Alex Graves, Martin Riedmiller, Andreas K Fidjeland, Georg Ostrovski, Stig Petersen, Charles Beattie, Amir Sadik, Ioannis Antonoglou, Helen King, Dharshan Kumaran, Daan Wierstra, Shane Legg, Demis Hassabis

Neural fitted Q iteration–first experiences with a data efficient neural reinforcement learning method
Martin Riedmiller

Max Schwarzer

Episode 45

mardi 8 août 2023 • Duration 01:10:18

Max Schwarzer is a PhD student at Mila, with Aaron Courville and Marc Bellemare, interested in RL scaling, representation learning for RL, and RL for science. Max spent the last 1.5 years at Google Brain/DeepMind, and is now at Apple Machine Learning Research.

Featured References

Bigger, Better, Faster: Human-level Atari with human-level efficiency
Max Schwarzer, Johan Obando-Ceron, Aaron Courville, Marc Bellemare, Rishabh Agarwal, Pablo Samuel Castro

Sample-Efficient Reinforcement Learning by Breaking the Replay Ratio Barrier
Pierluca D'Oro, Max Schwarzer, Evgenii Nikishin, Pierre-Luc Bacon, Marc G Bellemare, Aaron Courville

The Primacy Bias in Deep Reinforcement Learning
Evgenii Nikishin, Max Schwarzer, Pierluca D'Oro, Pierre-Luc Bacon, Aaron Courville

Additional References