Back

Explore every episode of the podcast TalkRL: The Reinforcement Learning Podcast

Dive into the complete episode list for TalkRL: The Reinforcement Learning Podcast. Each episode is cataloged with detailed descriptions, making it easy to find and explore specific topics. Keep track of all episodes from your favorite podcast and never miss a moment of insightful content.

Rows per page:

1–50 of 66

TitlePub. DateDuration
Neurips 2024 RL meetup Hot takes: What sucks about RL?23 Dec 202400:17:45

What do RL researchers complain about after hours at the bar?  In this "Hot takes" episode, we find out!  

Recorded at The Pearl in downtown Vancouver, during the RL meetup after a day of Neurips 2024.  

Special thanks to "David Beckham" for the inspiration :)  

RLC 2024 - Posters and Hallways 520 Sep 202400:13:17

Posters and Hallway episodes are short interviews and poster summaries.  Recorded at RLC 2024 in Amherst MA.   

Featuring:  

  • 0:01 David Radke of the Chicago Blackhawks NHL on RL for professional sports  
  • 0:56 Abhishek Naik from the National Research Council on Continuing RL and Average Reward  
  • 2:42 Daphne Cornelisse from NYU on Autonomous Driving and Multi-Agent RL  
  • 08:58 Shray Bansal from Georgia Tech on Cognitive Bias for Human AI Ad hoc Teamwork  
  • 10:21 Claas Voelcker from University of Toronto on Can we hop in general?  
  • 11:23 Brent Venable from The Institute for Human & Machine Cognition on Cooperative information dissemination  


Arash Ahmadian on Rethinking RLHF25 Mar 202400:33:30

Arash Ahmadian is a Researcher at Cohere and Cohere For AI focussed on Preference Training of large language models. He’s also a researcher at the Vector Institute of AI.

Featured Reference

Back to Basics: Revisiting REINFORCE Style Optimization for Learning from Human Feedback in LLMs

Arash Ahmadian, Chris Cremer, Matthias Gallé, Marzieh Fadaee, Julia Kreutzer, Olivier Pietquin, Ahmet Üstün, Sara Hooker


Additional References

Glen Berseth on RL Conference11 Mar 202400:21:38

Glen Berseth is an assistant professor at the Université de Montréal, a core academic member of the Mila - Quebec AI Institute, a Canada CIFAR AI chair, member l'Institute Courtios, and co-director of the Robotics and Embodied AI Lab (REAL). 

Featured Links 

Reinforcement Learning Conference 

Closing the Gap between TD Learning and Supervised Learning--A Generalisation Point of View
Raj Ghugare, Matthieu Geist, Glen Berseth, Benjamin Eysenbach

Ian Osband07 Mar 202401:08:26

Ian Osband is a Research scientist at OpenAI (ex DeepMind, Stanford) working on decision making under uncertainty.  

We spoke about: 

- Information theory and RL 

- Exploration, epistemic uncertainty and joint predictions 

- Epistemic Neural Networks and scaling to LLMs 


Featured References 

Reinforcement Learning, Bit by Bit 
Xiuyuan Lu, Benjamin Van Roy, Vikranth Dwaracherla, Morteza Ibrahimi, Ian Osband, Zheng Wen 

From Predictions to Decisions: The Importance of Joint Predictive Distributions 

Zheng Wen, Ian Osband, Chao Qin, Xiuyuan Lu, Morteza Ibrahimi, Vikranth Dwaracherla, Mohammad Asghari, Benjamin Van Roy  

 

Epistemic Neural Networks 

Ian Osband, Zheng Wen, Seyed Mohammad Asghari, Vikranth Dwaracherla, Morteza Ibrahimi, Xiuyuan Lu, Benjamin Van Roy  


Approximate Thompson Sampling via Epistemic Neural Networks 

Ian Osband, Zheng Wen, Seyed Mohammad Asghari, Vikranth Dwaracherla, Morteza Ibrahimi, Xiuyuan Lu, Benjamin Van Roy 

  


Additional References  

Sharath Chandra Raparthy12 Feb 202400:40:41

Sharath Chandra Raparthy on In-Context Learning for Sequential Decision Tasks, GFlowNets, and more!  

Sharath Chandra Raparthy is an AI Resident at FAIR at Meta, and did his Master's at Mila.  


Featured Reference 

Generalization to New Sequential Decision Making Tasks with In-Context Learning   
Sharath Chandra Raparthy , Eric Hambro, Robert Kirk , Mikael Henaff, , Roberta Raileanu 

Additional References  


Pierluca D'Oro and Martin Klissarov13 Nov 202300:57:24

Pierluca D'Oro and Martin Klissarov on Motif and RLAIF, Noisy Neighborhoods and Return Landscapes, and more!  

Pierluca D'Oro is PhD student at Mila and visiting researcher at Meta.


Martin Klissarov is a PhD student at Mila and McGill and research scientist intern at Meta.  


Featured References 

Motif: Intrinsic Motivation from Artificial Intelligence Feedback 
Martin Klissarov*, Pierluca D'Oro*, Shagun Sodhani, Roberta Raileanu, Pierre-Luc Bacon, Pascal Vincent, Amy Zhang, Mikael Henaff 

Policy Optimization in a Noisy Neighborhood: On Return Landscapes in Continuous Control 
Nate Rahn*, Pierluca D'Oro*, Harley Wiltzer, Pierre-Luc Bacon, Marc G. Bellemare 

To keep doing RL research, stop calling yourself an RL researcher
Pierluca D'Oro 

Martin Riedmiller22 Aug 202301:13:56

Martin Riedmiller of Google DeepMind on controlling nuclear fusion plasma in a tokamak with RL, the original Deep Q-Network, Neural Fitted Q-Iteration, Collect and Infer, AGI for control systems, and tons more!  


Martin Riedmiller is a research scientist and team lead at DeepMind.   


Featured References   


Magnetic control of tokamak plasmas through deep reinforcement learning 
Jonas Degrave, Federico Felici, Jonas Buchli, Michael Neunert, Brendan Tracey, Francesco Carpanese, Timo Ewalds, Roland Hafner, Abbas Abdolmaleki, Diego de las Casas, Craig Donner, Leslie Fritz, Cristian Galperti, Andrea Huber, James Keeling, Maria Tsimpoukelli, Jackie Kay, Antoine Merle, Jean-Marc Moret, Seb Noury, Federico Pesamosca, David Pfau, Olivier Sauter, Cristian Sommariva, Stefano Coda, Basil Duval, Ambrogio Fasoli, Pushmeet Kohli, Koray Kavukcuoglu, Demis Hassabis & Martin Riedmiller


Human-level control through deep reinforcement learning
Volodymyr Mnih, Koray Kavukcuoglu, David Silver, Andrei A Rusu, Joel Veness, Marc G Bellemare, Alex Graves, Martin Riedmiller, Andreas K Fidjeland, Georg Ostrovski, Stig Petersen, Charles Beattie, Amir Sadik, Ioannis Antonoglou, Helen King, Dharshan Kumaran, Daan Wierstra, Shane Legg, Demis Hassabis 

Neural fitted Q iteration–first experiences with a data efficient neural reinforcement learning method 
Martin Riedmiller  

Max Schwarzer08 Aug 202301:10:18

Max Schwarzer is a PhD student at Mila, with Aaron Courville and Marc Bellemare, interested in RL scaling, representation learning for RL, and RL for science.  Max spent the last 1.5 years at Google Brain/DeepMind, and is now at Apple Machine Learning Research.   

Featured References

Bigger, Better, Faster: Human-level Atari with human-level efficiency 
Max Schwarzer, Johan Obando-Ceron, Aaron Courville, Marc Bellemare, Rishabh Agarwal, Pablo Samuel Castro 

Sample-Efficient Reinforcement Learning by Breaking the Replay Ratio Barrier
Pierluca D'Oro, Max Schwarzer, Evgenii Nikishin, Pierre-Luc Bacon, Marc G Bellemare, Aaron Courville 

The Primacy Bias in Deep Reinforcement Learning
Evgenii Nikishin, Max Schwarzer, Pierluca D'Oro, Pierre-Luc Bacon, Aaron Courville 


Additional References   



Julian Togelius25 Jul 202300:40:04

Julian Togelius is an Associate Professor of Computer Science and Engineering at NYU, and Cofounder and research director at modl.ai


  

Featured References  
Choose Your Weapon: Survival Strategies for Depressed AI Academics

Julian Togelius, Georgios N. Yannakakis


Learning Controllable 3D Level Generators

Zehua Jiang, Sam Earle, Michael Cerny Green, Julian Togelius


PCGRL: Procedural Content Generation via Reinforcement Learning

Ahmed Khalifa, Philip Bontrager, Sam Earle, Julian Togelius


Illuminating Generalization in Deep Reinforcement Learning through Procedural Level Generation

Niels Justesen, Ruben Rodriguez Torrado, Philip Bontrager, Ahmed Khalifa, Julian Togelius, Sebastian Risi


Jakob Foerster08 May 202301:03:45

Jakob Foerster on Multi-Agent learning, Cooperation vs Competition, Emergent Communication, Zero-shot coordination, Opponent Shaping, agents for Hanabi and Prisoner's Dilemma, and more.  

Jakob Foerster is an Associate Professor at University of Oxford.  

Featured References  

Learning with Opponent-Learning Awareness 
Jakob N. Foerster, Richard Y. Chen, Maruan Al-Shedivat, Shimon Whiteson, Pieter Abbeel, Igor Mordatch  

Model-Free Opponent Shaping 
Chris Lu, Timon Willi, Christian Schroeder de Witt, Jakob Foerster  

Off-Belief Learning 
Hengyuan Hu, Adam Lerer, Brandon Cui, David Wu, Luis Pineda, Noam Brown, Jakob Foerster  

Learning to Communicate with Deep Multi-Agent Reinforcement Learning 
Jakob N. Foerster, Yannis M. Assael, Nando de Freitas, Shimon Whiteson  

Adversarial Cheap Talk 
Chris Lu, Timon Willi, Alistair Letcher, Jakob Foerster  

Cheap Talk Discovery and Utilization in Multi-Agent Reinforcement Learning 
Yat Long Lo, Christian Schroeder de Witt, Samuel Sokota, Jakob Nicolaus Foerster, Shimon Whiteson  


Additional References  


Danijar Hafner 212 Apr 202300:45:21

Danijar Hafner on the DreamerV3 agent and world models, the Director agent and heirarchical RL,  realtime RL on robots with DayDreamer, and his framework for unsupervised agent design!

Danijar Hafner is a PhD candidate at the University of Toronto with Jimmy Ba, a visiting student at UC Berkeley with Pieter Abbeel, and an intern at DeepMind.  He has been our guest before back on episode 11.  


Featured References   

Mastering Diverse Domains through World Models [ blog ] DreaverV3 

Danijar Hafner, Jurgis Pasukonis, Jimmy Ba, Timothy Lillicrap  


DayDreamer: World Models for Physical Robot Learning [ blog
Philipp Wu, Alejandro Escontrela, Danijar Hafner, Ken Goldberg, Pieter Abbeel 

Deep Hierarchical Planning from Pixels [ blog
Danijar Hafner, Kuang-Huei Lee, Ian Fischer, Pieter Abbeel   

Action and Perception as Divergence Minimization [ blog
Danijar Hafner, Pedro A. Ortega, Jimmy Ba, Thomas Parr, Karl Friston, Nicolas Heess 


Additional References  


RLC 2024 - Posters and Hallways 419 Sep 202400:04:52

Posters and Hallway episodes are short interviews and poster summaries.  Recorded at RLC 2024 in Amherst MA.   

Featuring:  

  • 0:01  David Abel from DeepMind on 3 Dogmas of RL  
  • 0:55 Kevin Wang from Brown on learning variable depth search for MCTS  
  • 2:17 Ashwin Kumar from Washington University in St Louis on fairness in resource allocation  
  • 3:36 Prabhat Nagarajan from UAlberta on Value overestimation  
Jeff Clune27 Mar 202301:11:11

AI Generating Algos, Learning to play Minecraft with Video PreTraining (VPT), Go-Explore for hard exploration, POET and Open Endedness, AI-GAs and ChatGPT, AGI predictions, and lots more!  

Professor Jeff Clune is Associate Professor of Computer Science at University of British Columbia, a Canada CIFAR AI Chair and Faculty Member at Vector Institute, and Senior Research Advisor at DeepMind.  


Featured References 

Video PreTraining (VPT): Learning to Act by Watching Unlabeled Online Videos [ Blog Post ]
Bowen Baker, Ilge Akkaya, Peter Zhokhov, Joost Huizinga, Jie Tang, Adrien Ecoffet, Brandon Houghton, Raul Sampedro, Jeff Clune 

Robots that can adapt like animals
Antoine Cully, Jeff Clune, Danesh Tarapore, Jean-Baptiste Mouret 

Illuminating search spaces by mapping elites
Jean-Baptiste Mouret, Jeff Clune 

Enhanced POET: Open-Ended Reinforcement Learning through Unbounded Invention of Learning Challenges and their Solutions
Rui Wang, Joel Lehman, Aditya Rawal, Jiale Zhi, Yulun Li, Jeff Clune, Kenneth O. Stanley 

Paired Open-Ended Trailblazer (POET): Endlessly Generating Increasingly Complex and Diverse Learning Environments and Their Solutions
Rui Wang, Joel Lehman, Jeff Clune, Kenneth O. Stanley 

First return, then explore
Adrien Ecoffet, Joost Huizinga, Joel Lehman, Kenneth O. Stanley, Jeff Clune

Natasha Jaques 214 Mar 202300:46:02

Hear about why OpenAI cites her work in RLHF and dialog models, approaches to rewards in RLHF, ChatGPT, Industry vs Academia, PsiPhi-Learning, AGI and more! 

Dr Natasha Jaques is a Senior Research Scientist at Google Brain.

Featured References

Way Off-Policy Batch Deep Reinforcement Learning of Implicit Human Preferences in Dialog
Natasha Jaques, Asma Ghandeharioun, Judy Hanwen Shen, Craig Ferguson, Agata Lapedriza, Noah Jones, Shixiang Gu, Rosalind Picard 

Sequence Tutor: Conservative Fine-Tuning of Sequence Generation Models with KL-control
Natasha Jaques, Shixiang Gu, Dzmitry Bahdanau, José Miguel Hernández-Lobato, Richard E. Turner, Douglas Eck 

PsiPhi-Learning: Reinforcement Learning with Demonstrations using Successor Features and Inverse Temporal Difference Learning
Angelos Filos, Clare Lyle, Yarin Gal, Sergey Levine, Natasha Jaques, Gregory Farquhar 

Basis for Intentions: Efficient Inverse Reinforcement Learning using Past Experience
Marwa Abdulhai, Natasha Jaques, Sergey Levine 


Additional References  


Jacob Beck and Risto Vuorio07 Mar 202301:07:05

Jacob Beck and Risto Vuorio on their recent Survey of Meta-Reinforcement Learning.  Jacob and Risto are Ph.D. students at Whiteson Research Lab at University of Oxford.   


Featured Reference   


A Survey of Meta-Reinforcement Learning
Jacob Beck, Risto Vuorio, Evan Zheran Liu, Zheng Xiong, Luisa Zintgraf, Chelsea Finn, Shimon Whiteson   


Additional References  

John Schulman18 Oct 202200:44:21

John Schulman is a cofounder of OpenAI, and currently a researcher and engineer at OpenAI.


Featured References

WebGPT: Browser-assisted question-answering with human feedback
Reiichiro Nakano, Jacob Hilton, Suchir Balaji, Jeff Wu, Long Ouyang, Christina Kim, Christopher Hesse, Shantanu Jain, Vineet Kosaraju, William Saunders, Xu Jiang, Karl Cobbe, Tyna Eloundou, Gretchen Krueger, Kevin Button, Matthew Knight, Benjamin Chess, John Schulman

Training language models to follow instructions with human feedback
Long Ouyang, Jeff Wu, Xu Jiang, Diogo Almeida, Carroll L. Wainwright, Pamela Mishkin, Chong Zhang, Sandhini Agarwal, Katarina Slama, Alex Ray, John Schulman, Jacob Hilton, Fraser Kelton, Luke Miller, Maddie Simens, Amanda Askell, Peter Welinder, Paul Christiano, Jan Leike, Ryan Lowe

Additional References


Sven Mika19 Aug 202200:34:56

Sven Mika is the Reinforcement Learning Team Lead at Anyscale, and lead committer of RLlib. He holds a PhD in biomathematics, bioinformatics, and computational biology from Witten/Herdecke University. 


Featured References

RLlib Documentation: RLlib: Industry-Grade Reinforcement Learning

Ray: Documentation

RLlib: Abstractions for Distributed Reinforcement Learning
Eric Liang, Richard Liaw, Philipp Moritz, Robert Nishihara, Roy Fox, Ken Goldberg, Joseph E. Gonzalez, Michael I. Jordan, Ion Stoica


Episode sponsor: Anyscale

Ray Summit 2022 is coming to San Francisco on August 23-24.
Hear how teams at Dow, Verizon, Riot Games, and more are solving their RL challenges with Ray's RLlib.

Register at raysummit.org and use code RAYSUMMIT22RL for a further 25% off the already reduced prices.

Karol Hausman and Fei Xia16 Aug 202201:03:09

Karol Hausman is a Senior Research Scientist at Google Brain and an Adjunct Professor at Stanford working on robotics and machine learning. Karol is interested in enabling robots to acquire general-purpose skills with minimal supervision in real-world environments.

Fei Xia is a Research Scientist with Google Research. Fei Xia is mostly interested in robot learning in complex and unstructured environments. Previously he has been approaching this problem by learning in realistic and scalable simulation environments (GibsonEnv, iGibson). Most recently, he has been exploring using foundation models for those challenges.

Featured References

Do As I Can, Not As I Say: Grounding Language in Robotic Affordances [ website ]
Michael Ahn, Anthony Brohan, Noah Brown, Yevgen Chebotar, Omar Cortes, Byron David, Chelsea Finn, Keerthana Gopalakrishnan, Karol Hausman, Alex Herzog, Daniel Ho, Jasmine Hsu, Julian Ibarz, Brian Ichter, Alex Irpan, Eric Jang, Rosario Jauregui Ruano, Kyle Jeffrey, Sally Jesmonth, Nikhil J Joshi, Ryan Julian, Dmitry Kalashnikov, Yuheng Kuang, Kuang-Huei Lee, Sergey Levine, Yao Lu, Linda Luu, Carolina Parada, Peter Pastor, Jornell Quiambao, Kanishka Rao, Jarek Rettinghouse, Diego Reyes, Pierre Sermanet, Nicolas Sievers, Clayton Tan, Alexander Toshev, Vincent Vanhoucke, Fei Xia, Ted Xiao, Peng Xu, Sichun Xu, Mengyuan Yan

Inner Monologue: Embodied Reasoning through Planning with Language Models
Wenlong Huang, Fei Xia, Ted Xiao, Harris Chan, Jacky Liang, Pete Florence, Andy Zeng, Jonathan Tompson, Igor Mordatch, Yevgen Chebotar, Pierre Sermanet, Noah Brown, Tomas Jackson, Linda Luu, Sergey Levine, Karol Hausman, Brian Ichter

Additional References


Episode sponsor: Anyscale

Ray Summit 2022 is coming to San Francisco on August 23-24.
Hear how teams at Dow, Verizon, Riot Games, and more are solving their RL challenges with Ray's RLlib.

Register at raysummit.org and use code RAYSUMMIT22RL for a further 25% off the already reduced prices.

Sai Krishna Gottipati01 Aug 202201:08:11

Saikrishna Gottipati is an RL Researcher at AI Redefined, working on RL, MARL, human in the loop learning.

Featured References

Cogment: Open Source Framework For Distributed Multi-actor Training, Deployment & Operations
AI Redefined, Sai Krishna Gottipati, Sagar Kurandwad, Clodéric Mars, Gregory Szriftgiser, François Chabot

Do As You Teach: A Multi-Teacher Approach to Self-Play in Deep Reinforcement Learning
Currently under review

Learning to navigate the synthetically accessible chemical space using reinforcement learning
Sai Krishna Gottipati, Boris Sattarov, Sufeng Niu, Yashaswi Pathak, Haoran Wei, Shengchao Liu, Karam J. Thomas, Simon Blackburn, Connor W. Coley, Jian Tang, Sarath Chandar, Yoshua Bengio

Additional References

Episode sponsor: Anyscale

Ray Summit 2022 is coming to San Francisco on August 23-24.
Hear how teams at Dow, Verizon, Riot Games, and more are solving their RL challenges with Ray's RLlib.

Register at raysummit.org and use code RAYSUMMIT22RL for a further 25% off the already reduced prices.

Aravind Srinivas 209 May 202200:58:33

Aravind Srinivas is back!  He is now a research Scientist at OpenAI.

Featured References

Decision Transformer: Reinforcement Learning via Sequence Modeling
Lili Chen, Kevin Lu, Aravind Rajeswaran, Kimin Lee, Aditya Grover, Michael Laskin, Pieter Abbeel, Aravind Srinivas, Igor Mordatch

VideoGPT: Video Generation using VQ-VAE and Transformers
Wilson Yan, Yunzhi Zhang, Pieter Abbeel, Aravind Srinivas

Rohin Shah12 Apr 202201:37:04

Dr. Rohin Shah is a Research Scientist at DeepMind, and the editor and main contributor of the Alignment Newsletter.

Featured References

The MineRL BASALT Competition on Learning from Human Feedback
Rohin Shah, Cody Wild, Steven H. Wang, Neel Alex, Brandon Houghton, William Guss, Sharada Mohanty, Anssi Kanervisto, Stephanie Milani, Nicholay Topin, Pieter Abbeel, Stuart Russell, Anca Dragan

Preferences Implicit in the State of the World
Rohin Shah, Dmitrii Krasheninnikov, Jordan Alexander, Pieter Abbeel, Anca Dragan

Benefits of Assistance over Reward Learning
Rohin Shah, Pedro Freire, Neel Alex, Rachel Freedman, Dmitrii Krasheninnikov, Lawrence Chan, Michael D Dennis, Pieter Abbeel, Anca Dragan, Stuart Russell

On the Utility of Learning about Humans for Human-AI Coordination
Micah Carroll, Rohin Shah, Mark K. Ho, Thomas L. Griffiths, Sanjit A. Seshia, Pieter Abbeel, Anca Dragan

Evaluating the Robustness of Collaborative Agents
Paul Knott, Micah Carroll, Sam Devlin, Kamil Ciosek, Katja Hofmann, A. D. Dragan, Rohin Shah


Additional References


Jordan Terry22 Feb 202201:03:48

Jordan Terry is a PhD candidate at University of Maryland, the maintainer of Gym, the maintainer and creator of PettingZoo and the founder of Swarm Labs.


Featured References

PettingZoo: Gym for Multi-Agent Reinforcement Learning
J. K. Terry, Benjamin Black, Nathaniel Grammel, Mario Jayakumar, Ananth Hari, Ryan Sullivan, Luis Santos, Rodrigo Perez, Caroline Horsch, Clemens Dieffendahl, Niall L. Williams, Yashas Lokesh, Praveen Ravi

PettingZoo on Github

gym on Github


Additional References


RLC 2024 - Posters and Hallways 318 Sep 202400:06:43

Posters and Hallway episodes are short interviews and poster summaries.  Recorded at RLC 2024 in Amherst MA.  

Featuring:  

  • 0:01 Kris De Asis from Openmind on Time Discretization  
  • 2:23 Anna Hakhverdyan from U of Alberta on Online Hyperparameters  
  • 3:59 Dilip Arumugam from Princeton on Information Theory and Exploration  
  • 5:04 Micah Carroll from UC Berkeley on Changing preferences and AI alignment  


Robert Lange20 Dec 202101:10:57
NeurIPS 2021 Political Economy of Reinforcement Learning Systems (PERLS) Workshop18 Nov 202100:24:07

We hear about the idea of PERLS and why its important to talk about.


Amy Zhang27 Sep 202101:09:35

Amy Zhang is a postdoctoral scholar at UC Berkeley and a research scientist at Facebook AI Research. She will be starting as an assistant professor at UT Austin in Spring 2023. 

Featured References 

Invariant Causal Prediction for Block MDPs 
Amy Zhang, Clare Lyle, Shagun Sodhani, Angelos Filos, Marta Kwiatkowska, Joelle Pineau, Yarin Gal, Doina Precup 

Multi-Task Reinforcement Learning with Context-based Representations 
Shagun Sodhani, Amy Zhang, Joelle Pineau 

MBRL-Lib: A Modular Library for Model-based Reinforcement Learning 
Luis Pineda, Brandon Amos, Amy Zhang, Nathan O. Lambert, Roberto Calandra 


Additional References 


Xianyuan Zhan30 Aug 202100:41:30

Xianyuan Zhan is currently a research assistant professor at the Institute for AI Industry Research (AIR), Tsinghua University.  He received his Ph.D. degree at Purdue University. Before joining Tsinghua University, Dr. Zhan worked as a researcher at Microsoft Research Asia (MSRA) and a data scientist at JD Technology.  At JD Technology, he led the research that uses offline RL to optimize real-world industrial systems. 

Featured References 

DeepThermal: Combustion Optimization for Thermal Power Generating Units Using Offline Reinforcement Learning
Xianyuan Zhan, Haoran Xu, Yue Zhang, Yusen Huo, Xiangyu Zhu, Honglei Yin, Yu Zheng 

Eugene Vinitsky18 Aug 202101:06:02

Eugene Vinitsky is a PhD student at UC Berkeley advised by Alexandre Bayen. He has interned at Tesla and Deepmind.  


Featured References 

A learning agent that acquires social norms from public sanctions in decentralized multi-agent settings 
Eugene Vinitsky, Raphael Köster, John P. Agapiou, Edgar Duéñez-Guzmán, Alexander Sasha Vezhnevets, Joel Z. Leibo 

Optimizing Mixed Autonomy Traffic Flow With Decentralized Autonomous Vehicles and Multi-Agent RL 
Eugene Vinitsky, Nathan Lichtle, Kanaad Parvate, Alexandre Bayen 

Lagrangian Control through Deep-RL: Applications to Bottleneck Decongestion 
Eugene Vinitsky; Kanaad Parvate; Aboudy Kreidieh; Cathy Wu; Alexandre Bayen 2018 

The Surprising Effectiveness of PPO in Cooperative Multi-Agent Games 
Chao Yu, Akash Velu, Eugene Vinitsky, Yu Wang, Alexandre Bayen, Yi Wu 


Additional References 


Jess Whittlestone20 Jul 202101:31:36

Dr. Jess Whittlestone is a Senior Research Fellow at the Centre for the Study of Existential Risk and the Leverhulme Centre for the Future of Intelligence, both at the University of Cambridge. 


Featured References 

The Societal Implications of Deep Reinforcement Learning 
Jess Whittlestone, Kai Arulkumaran, Matthew Crosby 

Artificial Canaries: Early Warning Signs for Anticipatory and Democratic Governance of AI 
Carla Zoe Cremer, Jess Whittlestone 


Additional References 


Aleksandra Faust06 Jul 202100:54:30

Dr Aleksandra Faust is a Staff Research Scientist and Reinforcement Learning research team co-founder at Google Brain Research.

Featured References

Reinforcement Learning and Planning for Preference Balancing Tasks 
Faust 2014

Learning Navigation Behaviors End-to-End with AutoRL
Hao-Tien Lewis Chiang, Aleksandra Faust, Marek Fiser, Anthony Francis

Evolving Rewards to Automate Reinforcement Learning 
Aleksandra Faust, Anthony Francis, Dar Mehta 

Evolving Reinforcement Learning Algorithms 

John D Co-Reyes, Yingjie Miao, Daiyi Peng, Esteban Real, Quoc V Le, Sergey Levine, Honglak Lee, Aleksandra Faust 


Adversarial Environment Generation for Learning to Navigate the Web 
Izzeddin Gur, Natasha Jaques, Kevin Malta, Manoj Tiwari, Honglak Lee, Aleksandra Faust 


Additional References 


 

Sam Ritter21 Jun 202101:40:35

Sam Ritter is a Research Scientist on the neuroscience team at DeepMind.

Featured References

Unsupervised Predictive Memory in a Goal-Directed Agent (MERLIN)
Greg Wayne, Chia-Chun Hung, David Amos, Mehdi Mirza, Arun Ahuja, Agnieszka Grabska-Barwinska, Jack Rae, Piotr Mirowski, Joel Z. Leibo, Adam Santoro, Mevlana Gemici, Malcolm Reynolds, Tim Harley, Josh Abramson, Shakir Mohamed, Danilo Rezende, David Saxton, Adam Cain, Chloe Hillier, David Silver, Koray Kavukcuoglu, Matt Botvinick, Demis Hassabis, Timothy Lillicrap

Meta-RL without forgetting:  Been There, Done That: Meta-Learning with Episodic Recall
Samuel Ritter, Jane X. Wang, Zeb Kurth-Nelson, Siddhant M. Jayakumar, Charles Blundell, Razvan Pascanu, Matthew Botvinick

Meta-Reinforcement Learning with Episodic Recall: An Integrative Theory of Reward-Driven Learning 
Samuel Ritter 2019 

Meta-RL exploration and planning: Rapid Task-Solving in Novel Environments 
Sam Ritter, Ryan Faulkner, Laurent Sartran, Adam Santoro, Matt Botvinick, David Raposo 

Synthetic Returns for Long-Term Credit Assignment 
David Raposo, Sam Ritter, Adam Santoro, Greg Wayne, Theophane Weber, Matt Botvinick, Hado van Hasselt, Francis Song 

Additional References 


Thomas Krendl Gilbert17 May 202101:12:14
Marc G. Bellemare13 May 202100:57:40

Professor Marc G. Bellemare is a Research Scientist at Google Research (Brain team), An Adjunct Professor at McGill University, and a Canada CIFAR AI Chair. 

Featured References 

The Arcade Learning Environment: An Evaluation Platform for General Agents 
Marc G. Bellemare, Yavar Naddaf, Joel Veness, Michael Bowling 

Human-level control through deep reinforcement learning 
Volodymyr Mnih, Koray Kavukcuoglu, David Silver, Andrei A. Rusu, Joel Veness, Marc G. Bellemare, Alex Graves, Martin Riedmiller, Andreas K. Fidjeland, Georg Ostrovski, Stig Petersen, Charles Beattie, Amir Sadik, Ioannis Antonoglou, Helen King, Dharshan Kumaran, Daan Wierstra, Shane Legg & Demis Hassabis 

Autonomous navigation of stratospheric balloons using reinforcement learning 
Marc G. Bellemare, Salvatore Candido, Pablo Samuel Castro, Jun Gong, Marlos C. Machado, Subhodeep Moitra, Sameera S. Ponda & Ziyu Wang 


Additional References 


RLC 2024 - Posters and Hallways 216 Sep 202400:15:52

Posters and Hallway episodes are short interviews and poster summaries.  Recorded at RLC 2024 in Amherst MA.  

Featuring:  


Robert Osazuwa Ness08 May 202101:18:43

Robert Osazuwa Ness is an adjunct professor of computer science at Northeastern University, an ML Research Engineer at Gamalon, and the founder of AltDeep School of AI.  He holds a PhD in statistics.  He studied at Johns Hopkins SAIS and then Purdue University. 


References 


Marlos C. Machado12 Apr 202101:31:31

Dr. Marlos C. Machado is a research scientist at DeepMind and an adjunct professor at the University of Alberta. He holds a PhD from the University of Alberta and a MSc and BSc from UFMG, in Brazil. 


Featured References 

Revisiting the Arcade Learning Environment: Evaluation Protocols and Open Problems for General Agents 
Marlos C. Machado, Marc G. Bellemare, Erik Talvitie, Joel Veness, Matthew J. Hausknecht, Michael Bowling 

Contrastive Behavioral Similarity Embeddings for Generalization in Reinforcement Learning [ video
Rishabh Agarwal, Marlos C. Machado, Pablo Samuel Castro, Marc G. Bellemare 

Efficient Exploration in Reinforcement Learning through Time-Based Representations 
Marlos C. Machado 

A Laplacian Framework for Option Discovery in Reinforcement Learning [ video
Marlos C. Machado, Marc G. Bellemare, Michael H. Bowling 

Eigenoption Discovery through the Deep Successor Representation 
Marlos C. Machado, Clemens Rosenbaum, Xiaoxiao Guo, Miao Liu, Gerald Tesauro, Murray Campbell 

Exploration in Reinforcement Learning with Deep Covering Options 
Yuu Jinnai, Jee Won Park, Marlos C. Machado, George Dimitri Konidaris 

Autonomous navigation of stratospheric balloons using reinforcement learning 
Marc G. Bellemare, Salvatore Candido, Pablo Samuel Castro, Jun Gong, Marlos C. Machado, Subhodeep Moitra, Sameera S. Ponda & Ziyu Wang 

Generalization and Regularization in DQN 
Jesse Farebrother, Marlos C. Machado, Michael Bowling 


Additional References 

Nathan Lambert22 Mar 202100:50:35

Nathan Lambert is a PhD Candidate at UC Berkeley. 

Featured References 

Learning Accurate Long-term Dynamics for Model-based Reinforcement Learning 
Nathan O. Lambert, Albert Wilcox, Howard Zhang, Kristofer S. J. Pister, Roberto Calandra 

Objective Mismatch in Model-based Reinforcement Learning 
Nathan Lambert, Brandon Amos, Omry Yadan, Roberto Calandra 

Low Level Control of a Quadrotor with Deep Model-Based Reinforcement Learning 
Nathan O. Lambert, Daniel S. Drew, Joseph Yaconelli, Roberto Calandra, Sergey Levine, Kristofer S.J. Pister 

On the Importance of Hyperparameter Optimization for Model-based Reinforcement Learning 
Baohe Zhang, Raghu Rajan, Luis Pineda, Nathan Lambert, André Biedenkapp, Kurtland Chua, Frank Hutter, Roberto Calandra 


Additional References 


Kai Arulkumaran16 Mar 202100:46:26

Kai Arulkumaran is a researcher at Araya in Tokyo. 

Featured References 

AlphaStar: An Evolutionary Computation Perspective 
Kai Arulkumaran, Antoine Cully, Julian Togelius 

Analysing Deep Reinforcement Learning Agents Trained with Domain Randomisation 
Tianhong Dai, Kai Arulkumaran, Tamara Gerbert, Samyakh Tukra, Feryal Behbahani, Anil Anthony Bharath 

Training Agents using Upside-Down Reinforcement Learning 
Rupesh Kumar Srivastava, Pranav Shyam, Filipe Mutz, Wojciech Jaśkowski, Jürgen Schmidhuber 


Additional References 


Michael Dennis26 Jan 202101:00:50

Michael Dennis is a PhD student at the Center for Human-Compatible AI at UC Berkeley, supervised by Professor Stuart Russell

I'm interested in robustness in RL and multi-agent RL, specifically as it applies to making the interaction between AI systems and society at large to be more beneficial.   

--Michael Dennis 


Featured References

Emergent Complexity and Zero-shot Transfer via Unsupervised Environment Design [PAIRED]
Michael Dennis, Natasha Jaques, Eugene Vinitsky, Alexandre Bayen, Stuart Russell, Andrew Critch, Sergey Levine
Videos

Adversarial Policies: Attacking Deep Reinforcement Learning 

Adam Gleave, Michael Dennis, Cody Wild, Neel Kant, Sergey Levine, Stuart Russell
Homepage and Videos

Accumulating Risk Capital Through Investing in Cooperation
Charlotte Roman, Michael Dennis, Andrew Critch, Stuart Russell 


Quantifying Differences in Reward Functions [EPIC]
Adam Gleave, Michael Dennis, Shane Legg, Stuart Russell, Jan Leike


Additional References 


Roman Ring11 Jan 202100:42:23
Shimon Whiteson06 Dec 202000:53:35

Shimon Whiteson is a Professor of Computer Science at Oxford University, the head of WhiRL, the Whiteson Research Lab at Oxford, and Head of Research at Waymo UK. 


Featured References 

VariBAD: A Very Good Method for Bayes-Adaptive Deep RL via Meta-Learning 
Luisa Zintgraf, Kyriacos Shiarlis, Maximilian Igl, Sebastian Schulze, Yarin Gal, Katja Hofmann, Shimon Whiteson 

Monotonic Value Function Factorisation for Deep Multi-Agent Reinforcement Learning 
Tabish Rashid, Mikayel Samvelyan, Christian Schroeder de Witt, Gregory Farquhar, Jakob Foerster, Shimon Whiteson 


Additional References 



Aravind Srinivas21 Sep 202001:25:27

Aravind Srinivas is a 3rd year PhD student at UC Berkeley advised by Prof. Abbeel. 
He co-created and co-taught a grad course on Deep Unsupervised Learning at Berkeley. 


Featured References 

Data-Efficient Image Recognition with Contrastive Predictive Coding 
Olivier J. Hénaff, Aravind Srinivas, Jeffrey De Fauw, Ali Razavi, Carl Doersch, S. M. Ali Eslami, Aaron van den Oord 

Contrastive Unsupervised Representations for Reinforcement Learning 
Aravind Srinivas, Michael Laskin, Pieter Abbeel 

Reinforcement Learning with Augmented Data 
Michael Laskin, Kimin Lee, Adam Stooke, Lerrel Pinto, Pieter Abbeel, Aravind Srinivas 

SUNRISE: A Simple Unified Framework for Ensemble Learning in Deep Reinforcement Learning 
Kimin Lee, Michael Laskin, Aravind Srinivas, Pieter Abbeel 


Additional References 


Taylor Killian17 Aug 202001:29:55

Taylor Killian is a Ph.D. student at the University of Toronto and the Vector Institute, and an Intern at Google Brain.

Featured References 

Direct Policy Transfer with Hidden Parameter Markov Decision Processes
Yao, Killian, Konidaris, Doshi-Velez 

Robust and Efficient Transfer Learning with Hidden Parameter Markov Decision Processes
Killian, Daulton, Konidaris, Doshi-Velez 

Transfer Learning Across Patient Variations with Hidden Parameter Markov Decision Processes
Killian, Konidaris, Doshi-Velez 

Counterfactually Guided Policy Transfer in Clinical Settings
Killian, Ghassemi, Joshi 


Additional References 


Nan Jiang06 Jul 202001:11:46

Nan Jiang is an Assistant Professor of Computer Science at University of Illinois.  He was a Postdoc Microsoft Research, and did his PhD at University of Michigan under Professor Satinder Singh. 


Featured References 

 
Additional References 


Errata 

  • [Robin] I misspoke when I said in domain randomization we want the agent to "ignore" domain parameters.  What I should have said is, we want the agent to perform well within some range of domain parameters, it should be robust with respect to domain parameters. 


RLC 2024 - Posters and Hallways 110 Sep 202400:05:46
Danijar Hafner14 May 202002:00:29

Danijar Hafner is a PhD student at the University of Toronto, and a student researcher at Google Research, Brain Team and the Vector Institute.  He holds a Masters of Research from University College London. 

Featured References 

  • A deep learning framework for neuroscience
    Blake A. Richards, Timothy P. Lillicrap , Philippe Beaudoin, Yoshua Bengio, Rafal Bogacz, Amelia Christensen, Claudia Clopath, Rui Ponte Costa, Archy de Berker, Surya Ganguli, Colleen J. Gillon , Danijar Hafner, Adam Kepecs, Nikolaus Kriegeskorte, Peter Latham , Grace W. Lindsay, Kenneth D. Miller , Richard Naud , Christopher C. Pack, Panayiota Poirazi , Pieter Roelfsema , João Sacramento, Andrew Saxe, Benjamin Scellier, Anna C. Schapiro , Walter Senn, Greg Wayne, Daniel Yamins, Friedemann Zenke, Joel Zylberberg, Denis Therien, Konrad P. Kording 
  • Learning Latent Dynamics for Planning from Pixels
    Danijar Hafner, Timothy Lillicrap, Ian Fischer, Ruben Villegas, David Ha, Honglak Lee, James Davidson 
  • Dream to Control: Learning Behaviors by Latent Imagination
    Danijar Hafner, Timothy Lillicrap, Jimmy Ba, Mohammad Norouzi 
  • Planning to Explore via Self-Supervised World Models
    Ramanan Sekar, Oleh Rybkin, Kostas Daniilidis, Pieter Abbeel, Danijar Hafner, Deepak Pathak 


Additional References

 
Errata 

  • [Robin] Around 1:37 I say "some ... world models get confused by random noise". I meant "some curiosity formulations", not "world models" 


Csaba Szepesvari05 Apr 202000:48:42

Csaba Szepesvari is: 

  • Head of the Foundations Team at DeepMind 
  • Professor of Computer Science at the University of Alberta 
  • Canada CIFAR AI Chair 
  • Fellow at the Alberta Machine Intelligence Institute  
  • Co-Author of the book Bandit Algorithms along with Tor Lattimore, and author of the book Algorithms for Reinforcement Learning 

References 


Ben Eysenbach30 Mar 202000:49:18

Ben Eysenbach is a PhD student in the Machine Learning Department at Carnegie Mellon University.  He was a Resident at Google Brain, and studied math and computer science at MIT. He co-founded the ICML Exploration in Reinforcement Learning workshop

Featured References

Diversity is All You Need: Learning Skills without a Reward Function
Benjamin Eysenbach, Abhishek Gupta, Julian Ibarz, Sergey Levine

Search on the Replay Buffer: Bridging Planning and Reinforcement Learning
Benjamin Eysenbach, Ruslan Salakhutdinov, Sergey Levine

Additional References 



NeurIPS 2019 Deep RL Workshop20 Dec 201900:56:18

Thank you to all the presenters that participated.  I covered as many as I could given the time and crowds, if you were not included and wish to be, please email talkrl@pathwayi.com 

More details on the official NeurIPS Deep RL Workshop site

© My Podcast Data