Reinforcement Learning - Reading List

Reading list for Reinforcement Learning

The papers listed below are a loose superset of the ones I try to cover, or have students present, throughout the term during my .

And Lo, the legends tell us, that before there was even the DQN, there was the incredible VFAs, and before this the Great Age of the Value Tables themselves.

Foundational papers fill the first part of the list. Once we enter the era of Deep Reinforcement Learning, the papers are grouped into early, middle, or later for how they would fall in a graduate course on Advanced RL such as my RL Courses (ECE 457C/657C). Other topics categories relate to general reference, papers mostly about environments to test out RL algorithms, or potential paper for future reading. Further ordering with [n] is sometime listed but this is just a rough guide. In general, if one paper is based on work in an older paper, the older paper should be discussed first, or the same week.

(You can jump to any stage with these links to find a paper)
foundational ~ early ~ middle ~ later ~ reference

foundational



  1. RL Book
    [0] Reinforcement Learning: An Introduction
    R.S. Sutton, and A.G. Barto.
    MIT Press, Cambridge, MA. 2018.
    DISCUSSED ON: 2024-09-06 by Prof. Mark Crowley for the first few weeks.
  2. [1] Dynamic Programming
    R Bellman.
    Princeton University Press, New Jersey. 1957.
  3. [2] Modified Policy Iteration Algorithms for Discounted Markov Decision Problems
    Martin L Puterman, and Moon Chirl Shin.
    Management Science. 24, (11). 1978.
  4. [6] Natural Actor Critic
    Jan Peters, Sethu Vijayakumar, and Stefan Schaal.
    In European Conference on Machine Learning. Springer Verlag, Berlin, 2005.
  5. [6] Actor-Critic Algorithms
    Vijay Konda, and John Tsitsiklis.
    In Advances in Neural Information Processing Systems. MIT Press, 1999.
  6. [8] Learning from Delayed Rewards
    C J Watkins.
    UK. 1989.
  7. [9] Policy Gradient Methods for Reinforcement Learning with Function Approximation
    Richard S Sutton, David McAllester, Satinder Singh, and Yishay Mansour.
    . 12, MIT Press, 1999.
  8. [9] Natural gradient works efficiently in learning.
    S Amari.
    Neural Computation. 10, 1998.
  9. [9] Neuro-Dymanic Programming
    Dimitri P Bertsekas, and John N Tsitsiklis.
    Athena Scientific, Nashua, NH.. 1996.
  10. [9] Simple statistical gradient-following algorithms for connectionist reinforcement learning
    Ronald J Williams.
    Machine Learning. 7, (2). 1992.
  11. [10] Neurocontrol and Supervised Learning: An Overview and Evaluation
    Paul Werbos.
    1992.

early



  1. Shallow
    [1] State of the Art Control of Atari Games Using Shallow Reinforcement Learning
    Yitao Liang, Marlos C. Machado, Erik Talvitie, and Michael Bowling.
    In Proceedings of the 2016 International Conference on Autonomous Agents & Multiagent Systems. International Foundation for Autonomous Agents and Multiagent Systems, Richland, South Carolina, USA. 2016.
    DISCUSSED ON: 2024-10-04 by Mark Crowley
  2. DQN
    [2] Playing Atari with Deep Reinforcement Learning
    Volodymyr Mnih, Koray Kavukcuoglu, David Silver, Alex Graves, Ioannis Antonoglou, Daan Wierstra, and Martin A Riedmiller.
    Arxiv Preprint. abs/1312.5, 2013.
    DISCUSSED ON: 2024-10-04 by Mark Crowley
  3. HER
    [3] Hindsight Experience Replay
    Marcin Andrychowicz, Dwight Crow, Alex Ray, Jonas Schneider, Rachel Fong, Peter Welinder, Bob McGrew, Josh Tobin, Pieter Abbeel, and Wojciech Zaremba.
    In Advances in Neural Information Processing Systems 30: Annual Conference on Neural Information Processing Systems 2017, December 4-9, 2017, Long Beach, CA, USA. 2017.
  4. PER
    [3] Prioritized Experience Replay
    Tom Schaul, John Quan, Ioannis Antonoglou, and David Silver.
    In 4th International Conference on Learning Representations, ICLR 2016, San Juan, Puerto Rico, May 2-4, 2016, Conference Track Proceedings. 2016.
    DISCUSSED ON: 2024-10-11 by Mark Crowley?
  5. Rainbow
    [4] Rainbow: Combining improvements in deep reinforcement learning
    Matteo Hessel, Joseph Modayil, Hado Van Hasselt, Tom Schaul, Georg Ostrovski, Will Dabney, Dan Horgan, Bilal Piot, Mohammad Azar, and David Silver.
    In Proceedings of the AAAI conference on artificial intelligence. 2018.
  6. Revisit-ER
    [5] Revisiting fundamentals of experience replay
    William Fedus, Prajit Ramachandran, Rishabh Agarwal, Yoshua Bengio, Hugo Larochelle, Mark Rowland, and Will Dabney.
    In International Conference on Machine Learning. 2020.
  7. PPO
    [5] Proximal policy optimization algorithms
    John Schulman, Filip Wolski, Prafulla Dhariwal, Alec Radford, and Oleg Klimov.
    arXiv preprint arXiv:1707.06347. 2017.
    DISCUSSED ON: 2024-10-11 by Majid Ghasemi
  8. A3C
    [6] Asynchronous Methods for Deep Reinforcement Learning
    Volodymyr Mnih, Adrià Puigdomènech Badia, Mehdi Mirza, Alex Graves, Timothy P. Lillicrap, Tim Harley, David Silver, and Koray Kavukcuoglu.
    In Proceedings of The 33rd International Conference on Machine Learning (ICML). 2016.
    DISCUSSED ON: 2024-10-11 by Mark Crowley
  9. [6] Human-level control through deep reinforcement learning
    Volodymyr Mnih, Koray Kavukcuoglu, David Silver, Andrei a Rusu, Joel Veness, Marc G Bellemare, Alex Graves, Martin Riedmiller, Andreas K Fidjeland, Georg Ostrovski, Stig Petersen, Charles Beattie, Amir Sadik, Ioannis Antonoglou, Helen King, Dharshan Kumaran, Daan Wierstra, Shane Legg, and Demis Hassabis.
    Nature. 518, (7540). 2015.
  10. [7] Soft actor-critic: Off-policy maximum entropy deep reinforcement learning with a stochastic actor
    Tuomas Haarnoja, Aurick Zhou, Pieter Abbeel, and Sergey Levine.
    In International conference on machine learning. 2018.

middle



  1. Distributional reinforcement learning
    Marc G Bellemare, Will Dabney, and Mark Rowland.
    MIT Press, 2023.
  2. Attention option-critic
    Raviteja Chunduru, and Doina Precup.
    arXiv preprint arXiv:2201.02628. 2022.
  3. The starcraft multi-agent challenge
    Mikayel Samvelyan, Tabish Rashid, Christian Schroeder De Witt, Gregory Farquhar, Nantas Nardelli, Tim GJ Rudner, Chia-Man Hung, Philip HS Torr, Jakob Foerster, and Shimon Whiteson.
    arXiv preprint arXiv:1902.04043. 2019.
  4. [5] Offline Reinforcement Learning as One Big Sequence Modeling Problem
    Michael Janner, Qiyang Li, and Sergey Levine.
    In Advances in Neural Information Processing Systems. Curran Associates, Inc., 2021.
  5. DecTransfrmr
    [8] Decision Transformer: Reinforcement Learning via Sequence Modeling
    Lili Chen, Kevin Lu, Aravind Rajeswaran, Kimin Lee, Aditya Grover, Misha Laskin, Pieter Abbeel, Aravind Srinivas, and Igor Mordatch.
    In Advances in Neural Information Processing Systems. Jun, 2021.
  6. [10] Reinforcement Learning as a Framework for Ethical Decision Making
    David Abel, James MacGlashan, and Michael L. Littman.
    In AAAI Workshop: AI, Ethics, and Society. 2016.
  7. [13] Robust Reinforcement Learning for Linear Temporal Logic Specifications with Finite Trajectory Duration
    Soroush Mortazavi Moghaddam, Yash Vardhan Pant, and Sebastian Fischmeister.
    Proceedings of the Canadian Conference on Artificial Intelligence. Canadian Artificial Intelligence Association (CAIAC), May, 2024.
  8. [19] Deep Hedging with Market Impact
    Andrei Neagu, Frédéric Godin, Clarence Simard, and Leila Kosseim.
    In . Canadian Artificial Intelligence Association (CAIAC), May, 2024.

later



  1. RLHF
    [1] Illustrating Reinforcement Learning from Human Feedback (RLHF)
    Nathan Lambert, Louis Castricato, Leandro Werra, and Alex Havrilla.
    Hugging Face Blog. 2022.
  2. ConstitEthic
    [2] Constitutional AI: Harmlessness from AI Feedback
    Yuntao Bai, Saurav Kadavath, Sandipan Kundu, Amanda Askell, Jackson Kernion, Andy Jones, Anna Chen, Anna Goldie, Azalia Mirhoseini, Cameron McKinnon, Carol Chen, Catherine Olsson, Christopher Olah, Danny Hernandez, Dawn Drain, Deep Ganguli, Dustin Li, Eli Tran-Johnson, Ethan Perez, Jamie Kerr, Jared Mueller, Jeffrey Ladish, Joshua Landau, Kamal Ndousse, Kamile Lukosuite, Liane Lovitt, Michael Sellitto, Nelson Elhage, Nicholas Schiefer, Noemi Mercado, Nova DasSarma, Robert Lasenby, Robin Larson, Sam Ringer, Scott Johnston, Shauna Kravec, Sheer El Showk, Stanislav Fort, Tamera Lanham, Timothy Telleen-Lawton, Tom Conerly, Tom Henighan, Tristan Hume, Samuel R. Bowman, Zac Hatfield-Dodds, Ben Mann, Dario Amodei, Nicholas Joseph, Sam McCandlish, Tom Brown, and Jared Kaplan.
    2022.
  3. MORAL
    [3] MORAL: Aligning AI with Human Norms through Multi-Objective Reinforced Active Learning
    Markus Peschl, Arkady Zgonnikov, Frans A. Oliehoek, and Luciano C. Siebert.
    In Proceedings of the 21st International Conference on Autonomous Agents and Multiagent Systems (AAMAS). International Foundation for Autonomous Agents and Multiagent Systems, Richland, SC. 2022.
  4. Moral Grid
    [5] Moral Gridworlds: A Theoretical Proposal for Modeling Artificial Moral Cognition
    Julia Haas.
    Minds and Machines. 30, (2). 2020.
  5. MoralityInterpret
    [6] Morality, Machines, and the Interpretation Problem: A Value-based, Wittgensteinian Approach to Building Moral Agents
    Cosmin Badea, and Gregory Artus.
    In Artificial Intelligence XXXIX. Springer International Publishing, Cham. 2022.
  6. Multi-Obj-Ethic
    [10] Multi-Objective Reinforcement Learning for Designing Ethical Environments
    Manel Rodriguez-Soto, Maite Lopez-Sanchez, and Juan A. Rodriguez Aguilar.
    In Proceedings of the Thirtieth International Joint Conference on Artificial Intelligence, IJCAI-21. International Joint Conferences on Artificial Intelligence Organization, Aug, 2021.
  7. ChemGymRL
    [19] ChemGymRL: An Interactive Framework for Reinforcement Learning for Digital Chemistry
    Chris Beeler, Sriram Ganapathi Subramanian, Kyle Sprague, Nouha Chatti, Colin Bellinger, Mitchell Shahen, Nicholas Paquin, Mark Baula, Amanuel Dawit, Zihan Yang, and others.
    arXiv preprint arXiv:2305.14177. 2023.
  8. SPRING
    [20] SPRING: GPT-4 Out-performs RL Algorithms by Studying Papers and Reasoning
    Yue Wu, Shrimai Prabhumoye, So Yeon Min, Yonatan Bisk, Ruslan Salakhutdinov, Amos Azaria, Tom Mitchell, and Yuanzhi Li.
    2023.

</div>


Other Reading

The following sections list additional relevant publications that are optional for my course. They mostly reference papers and books, simulation environment descriptions, or other resources that may also prove useful in understanding the course topic. potential publications are even more optional, and probably wouldn’t have time to be discussed in the course unless someone volunteers to present them.

reference ~ environment ~ potential

reference



  1. RL Book
    [0] Reinforcement Learning: An Introduction
    R.S. Sutton, and A.G. Barto.
    MIT Press, Cambridge, MA. 2018.
    DISCUSSED ON: 2024-09-06 by Prof. Mark Crowley for the first few weeks.
  2. [1] Artificial intelligence a modern approach
    Stuart J Russell.
    Pearson Education, Inc., 2010.
  3. [3] Optuna: A Next-generation Hyperparameter Optimization Framework
    Takuya Akiba, Shotaro Sano, Toshihiko Yanase, Takeru Ohta, and Masanori Koyama.
    In Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 2019.
  4. [5] The Art of Reinforcement Learning: Fundamentals, Mathematics, and Implementations with Python
    Michael Hu.
    Apress Berkeley, CA, 2023.

environment



  1. Minerl: A large-scale dataset of minecraft demonstrations
    William H Guss, Brandon Houghton, Nicholay Topin, Phillip Wang, Cayden Codel, Manuela Veloso, and Ruslan Salakhutdinov.
    arXiv preprint arXiv:1907.13440. 2019.
  2. The starcraft multi-agent challenge
    Mikayel Samvelyan, Tabish Rashid, Christian Schroeder De Witt, Gregory Farquhar, Nantas Nardelli, Tim GJ Rudner, Chia-Man Hung, Philip HS Torr, Jakob Foerster, and Shimon Whiteson.
    arXiv preprint arXiv:1902.04043. 2019.
  3. Openai gym
    Greg Brockman, Vicki Cheung, Ludwig Pettersson, Jonas Schneider, John Schulman, Jie Tang, and Wojciech Zaremba.
    arXiv preprint arXiv:1606.01540. 2016.
  4. [19] Deep Hedging with Market Impact
    Andrei Neagu, Frédéric Godin, Clarence Simard, and Leila Kosseim.
    In . Canadian Artificial Intelligence Association (CAIAC), May, 2024.
  5. ChemGymRL
    [19] ChemGymRL: An Interactive Framework for Reinforcement Learning for Digital Chemistry
    Chris Beeler, Sriram Ganapathi Subramanian, Kyle Sprague, Nouha Chatti, Colin Bellinger, Mitchell Shahen, Nicholas Paquin, Mark Baula, Amanuel Dawit, Zihan Yang, and others.
    arXiv preprint arXiv:2305.14177. 2023.

potential



  1. Symphony: Learning Realistic and Diverse Agents for Autonomous Driving Simulation
    Maximilian Igl, Daewoo Kim, Alex Kuefler, Paul Mougin, Punit Shah, Kyriacos Shiarlis, Dragomir Anguelov, Mark Palatucci, Brandyn White, and Shimon Whiteson.
    In International Conference on Robotics and Automation (ICRA). 2022.
  2. Graph Convolutional Networks for Chemical Relation Extraction
    Darshini Mahendran, Christina Tang, and Bridget T. McInnes.
    In Companion Proceedings of the Web Conference 2022. Association for Computing Machinery, New York, NY, USA. Apr, 2022.
  3. UAV Coverage Path Planning under Varying Power Constraints Using Deep Reinforcement Learning
    Mirco Theile, Harald Bayerlein, Richard Nai, David Gesbert, and Marco Caccamo.
    In 2020 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). IEEE, 2020.
  4. Beyond 512 Tokens: Siamese Multi-depth Transformer-based Hierarchical Encoder for Long-Form Document Matching
    Liu Yang, Mingyang Zhang, Cheng Li, Michael Bendersky, and Marc Najork.
    In Proceedings of the 29th ACM International Conference on Information & Knowledge Management. ACM, Virtual Event Ireland. Oct, 2020.
  5. Model-ensemble trust-region policy optimization
    Thanard Kurutach, Ignasi Clavera, Yan Duan, Aviv Tamar, and Pieter Abbeel.
    arXiv preprint arXiv:1802.10592. 2018.
  6. A distributional perspective on reinforcement learning
    Marc G Bellemare, Will Dabney, and Rémi Munos.
    In International conference on machine learning. 2017.
  7. Constrained policy optimization
    Joshua Achiam, David Held, Aviv Tamar, and Pieter Abbeel.
    In International conference on machine learning. 2017.
  8. Continuous control with deep reinforcement learning
    Timothy P Lillicrap, Jonathan J Hunt, Alexander Pritzel, Nicolas Heess, Tom Erez, Yuval Tassa, David Silver, and Daan Wierstra.
    arXiv preprint arXiv:1509.02971. 2015.
  9. Knows What It Knows: A Framework For Self-Aware Learning
    Lihong Li, Michael Littman, and Thomas J Walsh.
    Proceedings of the 25th International Conference on Machine Learning. 2008.
  10. PAC Model-Free Reinforcement Learning
    Alexander L Strehl, Eric Wiewiora, John Langford, and Michael L Littman.
    Update. 2006.
  11. [9] Talking About Large Language Models
    Murray Shanahan.
    Arxiv Preprint. Dec, 2022.
  12. [99] Galactica: A Large Language Model for Science
    Ross Taylor, Marcin Kardas, Guillem Cucurull, Thomas Scialom, Anthony Hartshorn, Elvis Saravia, Andrew Poulton, Viktor Kerkez, and Robert Stojnic.
    Arxiv Preprint. 2022.