ECE750T4 Reinforcement Learning - Reading List

Reading list for the Grad Topics on Reinforcement Learning (ECE 750 Topic 4) for Fall 2024

Introduction

This course will have two components, which shift focus over the term. The first component will be lectures and work problems about the fundamentals of Reinforcement Learning (RL).

The second component will be communal presentation and discussion of research papers on advanced topics in RL. The papers will be read by all, presented briefly by a student, and then discussed by everyone, led by the presenting student, for half of the class period.

This second component is structured around the common grad school practice of Reading Groups, see more information below or see the pages for some previous reading groups here.

Reading Groups Tips

In a reading group everyone takes turns leading discussion of a paper each week. Leading discussion can be as simple as having your own annotated notes on Hypothes.is to share and start discussion as we go through it together. Or it could be more involved, including making slides to present your overview of the paper’s contributions, highlights and weak points.


The Course Reading List

Papers listed below are ones that we are planning to read throughout the term. An order [n] is sometime listed but this is just a rough guide. In general, if one paper is base don work in an older paper, the older paper should be discussed first, or the same week.

What you need to do

If you are a student in this course you need to do the following:

  • Create a Hypothes.is account and sign up for the course group on Hypothes.is so you and everyone in the class can see our shared annotations
  • Pick the papers you will be reading, presenting, and leading discussion of and sign up (sign-up process TBD)
    • PhD Students: choose two papers, one of them near the start to set a good example!
    • Master’s Students: choose at least one paper
  • Then read the paper in detail, use Hypothes.is to make annotations for yourself and to guide others. Use the Hypothes.is course group you were all invited to do to this.
  • Prepare to present the main points of the paper, and guide discussion through the parts that are surprising, challening, interetsing, or that you don’t understand.
  • The class discussion of the paper should help everyone, including you and the prof come away with a better understanding and evaluation of this publication

Papers We’ll Be Reading

See the links below for information about and notes on papers planned readings for some time in the early, middle, or later part of the course, obtain the link for the current reading for this week, or for those that are done from previous weeks.

(Jump to a stage and sign up to lead a paper)
current ~ early ~ middle ~ later ~ done

current



  1. RL Book
    [0] Reinforcement Learning: An Introduction
    R.S. Sutton, and A.G. Barto.
    MIT Press, Cambridge, MA. 2018.
    DISCUSSED ON: 2024-09-06 by Prof. Mark Crowley for the first few weeks.
  2. Rainbow
    [4] Rainbow: Combining improvements in deep reinforcement learning
    Matteo Hessel, Joseph Modayil, Hado Van Hasselt, Tom Schaul, Georg Ostrovski, Will Dabney, Dan Horgan, Bilal Piot, Mohammad Azar, and David Silver.
    In Proceedings of the AAAI conference on artificial intelligence. 2018.
  3. PPO
    [5] Proximal policy optimization algorithms
    John Schulman, Filip Wolski, Prafulla Dhariwal, Alec Radford, and Oleg Klimov.
    arXiv preprint arXiv:1707.06347. 2017.
    DISCUSSED ON: 2024-10-11 by Majid Ghasemi

early



  1. Shallow
    [1] State of the Art Control of Atari Games Using Shallow Reinforcement Learning
    Yitao Liang, Marlos C. Machado, Erik Talvitie, and Michael Bowling.
    In Proceedings of the 2016 International Conference on Autonomous Agents & Multiagent Systems. International Foundation for Autonomous Agents and Multiagent Systems, Richland, South Carolina, USA. 2016.
    DISCUSSED ON: 2024-10-04 by Mark Crowley
  2. DQN
    [2] Playing Atari with Deep Reinforcement Learning
    Volodymyr Mnih, Koray Kavukcuoglu, David Silver, Alex Graves, Ioannis Antonoglou, Daan Wierstra, and Martin A Riedmiller.
    Arxiv Preprint. abs/1312.5, 2013.
    DISCUSSED ON: 2024-10-04 by Mark Crowley
  3. HER
    [3] Hindsight Experience Replay
    Marcin Andrychowicz, Dwight Crow, Alex Ray, Jonas Schneider, Rachel Fong, Peter Welinder, Bob McGrew, Josh Tobin, Pieter Abbeel, and Wojciech Zaremba.
    In Advances in Neural Information Processing Systems 30: Annual Conference on Neural Information Processing Systems 2017, December 4-9, 2017, Long Beach, CA, USA. 2017.
  4. PER
    [3] Prioritized Experience Replay
    Tom Schaul, John Quan, Ioannis Antonoglou, and David Silver.
    In 4th International Conference on Learning Representations, ICLR 2016, San Juan, Puerto Rico, May 2-4, 2016, Conference Track Proceedings. 2016.
    DISCUSSED ON: 2024-10-11 by Mark Crowley?
  5. Rainbow
    [4] Rainbow: Combining improvements in deep reinforcement learning
    Matteo Hessel, Joseph Modayil, Hado Van Hasselt, Tom Schaul, Georg Ostrovski, Will Dabney, Dan Horgan, Bilal Piot, Mohammad Azar, and David Silver.
    In Proceedings of the AAAI conference on artificial intelligence. 2018.
  6. Revisit-ER
    [5] Revisiting fundamentals of experience replay
    William Fedus, Prajit Ramachandran, Rishabh Agarwal, Yoshua Bengio, Hugo Larochelle, Mark Rowland, and Will Dabney.
    In International Conference on Machine Learning. 2020.
  7. PPO
    [5] Proximal policy optimization algorithms
    John Schulman, Filip Wolski, Prafulla Dhariwal, Alec Radford, and Oleg Klimov.
    arXiv preprint arXiv:1707.06347. 2017.
    DISCUSSED ON: 2024-10-11 by Majid Ghasemi
  8. A3C
    [6] Asynchronous Methods for Deep Reinforcement Learning
    Volodymyr Mnih, Adrià Puigdomènech Badia, Mehdi Mirza, Alex Graves, Timothy P. Lillicrap, Tim Harley, David Silver, and Koray Kavukcuoglu.
    In Proceedings of The 33rd International Conference on Machine Learning (ICML). 2016.
    DISCUSSED ON: 2024-10-11 by Mark Crowley
  9. [6] Human-level control through deep reinforcement learning
    Volodymyr Mnih, Koray Kavukcuoglu, David Silver, Andrei a Rusu, Joel Veness, Marc G Bellemare, Alex Graves, Martin Riedmiller, Andreas K Fidjeland, Georg Ostrovski, Stig Petersen, Charles Beattie, Amir Sadik, Ioannis Antonoglou, Helen King, Dharshan Kumaran, Daan Wierstra, Shane Legg, and Demis Hassabis.
    Nature. 518, (7540). 2015.
  10. [7] Soft actor-critic: Off-policy maximum entropy deep reinforcement learning with a stochastic actor
    Tuomas Haarnoja, Aurick Zhou, Pieter Abbeel, and Sergey Levine.
    In International conference on machine learning. 2018.

middle



  1. Distributional reinforcement learning
    Marc G Bellemare, Will Dabney, and Mark Rowland.
    MIT Press, 2023.
  2. Attention option-critic
    Raviteja Chunduru, and Doina Precup.
    arXiv preprint arXiv:2201.02628. 2022.
  3. The starcraft multi-agent challenge
    Mikayel Samvelyan, Tabish Rashid, Christian Schroeder De Witt, Gregory Farquhar, Nantas Nardelli, Tim GJ Rudner, Chia-Man Hung, Philip HS Torr, Jakob Foerster, and Shimon Whiteson.
    arXiv preprint arXiv:1902.04043. 2019.
  4. [5] Offline Reinforcement Learning as One Big Sequence Modeling Problem
    Michael Janner, Qiyang Li, and Sergey Levine.
    In Advances in Neural Information Processing Systems. Curran Associates, Inc., 2021.
  5. DecTransfrmr
    [8] Decision Transformer: Reinforcement Learning via Sequence Modeling
    Lili Chen, Kevin Lu, Aravind Rajeswaran, Kimin Lee, Aditya Grover, Misha Laskin, Pieter Abbeel, Aravind Srinivas, and Igor Mordatch.
    In Advances in Neural Information Processing Systems. Jun, 2021.
  6. [10] Reinforcement Learning as a Framework for Ethical Decision Making
    David Abel, James MacGlashan, and Michael L. Littman.
    In AAAI Workshop: AI, Ethics, and Society. 2016.
  7. [13] Robust Reinforcement Learning for Linear Temporal Logic Specifications with Finite Trajectory Duration
    Soroush Mortazavi Moghaddam, Yash Vardhan Pant, and Sebastian Fischmeister.
    Proceedings of the Canadian Conference on Artificial Intelligence. Canadian Artificial Intelligence Association (CAIAC), May, 2024.
  8. [19] Deep Hedging with Market Impact
    Andrei Neagu, Frédéric Godin, Clarence Simard, and Leila Kosseim.
    In . Canadian Artificial Intelligence Association (CAIAC), May, 2024.

later



  1. RLHF
    [1] Illustrating Reinforcement Learning from Human Feedback (RLHF)
    Nathan Lambert, Louis Castricato, Leandro Werra, and Alex Havrilla.
    Hugging Face Blog. 2022.
  2. ConstitEthic
    [2] Constitutional AI: Harmlessness from AI Feedback
    Yuntao Bai, Saurav Kadavath, Sandipan Kundu, Amanda Askell, Jackson Kernion, Andy Jones, Anna Chen, Anna Goldie, Azalia Mirhoseini, Cameron McKinnon, Carol Chen, Catherine Olsson, Christopher Olah, Danny Hernandez, Dawn Drain, Deep Ganguli, Dustin Li, Eli Tran-Johnson, Ethan Perez, Jamie Kerr, Jared Mueller, Jeffrey Ladish, Joshua Landau, Kamal Ndousse, Kamile Lukosuite, Liane Lovitt, Michael Sellitto, Nelson Elhage, Nicholas Schiefer, Noemi Mercado, Nova DasSarma, Robert Lasenby, Robin Larson, Sam Ringer, Scott Johnston, Shauna Kravec, Sheer El Showk, Stanislav Fort, Tamera Lanham, Timothy Telleen-Lawton, Tom Conerly, Tom Henighan, Tristan Hume, Samuel R. Bowman, Zac Hatfield-Dodds, Ben Mann, Dario Amodei, Nicholas Joseph, Sam McCandlish, Tom Brown, and Jared Kaplan.
    2022.
  3. MORAL
    [3] MORAL: Aligning AI with Human Norms through Multi-Objective Reinforced Active Learning
    Markus Peschl, Arkady Zgonnikov, Frans A. Oliehoek, and Luciano C. Siebert.
    In Proceedings of the 21st International Conference on Autonomous Agents and Multiagent Systems (AAMAS). International Foundation for Autonomous Agents and Multiagent Systems, Richland, SC. 2022.
  4. Moral Grid
    [5] Moral Gridworlds: A Theoretical Proposal for Modeling Artificial Moral Cognition
    juliahaas.
    Minds and Machines. 30, (2). 2020.
  5. MoralityInterpret
    [6] Morality, Machines, and the Interpretation Problem: A Value-based, Wittgensteinian Approach to Building Moral Agents
    Cosmin Badea, and Gregory Artus.
    In Artificial Intelligence XXXIX. Springer International Publishing, Cham. 2022.
  6. Multi-Obj-Ethic
    [10] Multi-Objective Reinforcement Learning for Designing Ethical Environments
    Manel Rodriguez-Soto, Maite Lopez-Sanchez, and Juan A. Rodriguez Aguilar.
    In Proceedings of the Thirtieth International Joint Conference on Artificial Intelligence, IJCAI-21. International Joint Conferences on Artificial Intelligence Organization, Aug, 2021.
  7. ChemGymRL
    [19] ChemGymRL: An Interactive Framework for Reinforcement Learning for Digital Chemistry
    Chris Beeler, Sriram Ganapathi Subramanian, Kyle Sprague, Nouha Chatti, Colin Bellinger, Mitchell Shahen, Nicholas Paquin, Mark Baula, Amanuel Dawit, Zihan Yang, and others.
    arXiv preprint arXiv:2305.14177. 2023.
  8. SPRING
    [20] SPRING: GPT-4 Out-performs RL Algorithms by Studying Papers and Reasoning
    Yue Wu, Shrimai Prabhumoye, So Yeon Min, Yonatan Bisk, Ruslan Salakhutdinov, Amos Azaria, Tom Mitchell, and Yuanzhi Li.
    2023.

done



  1. Shallow
    [1] State of the Art Control of Atari Games Using Shallow Reinforcement Learning
    Yitao Liang, Marlos C. Machado, Erik Talvitie, and Michael Bowling.
    In Proceedings of the 2016 International Conference on Autonomous Agents & Multiagent Systems. International Foundation for Autonomous Agents and Multiagent Systems, Richland, South Carolina, USA. 2016.
    DISCUSSED ON: 2024-10-04 by Mark Crowley
  2. DQN
    [2] Playing Atari with Deep Reinforcement Learning
    Volodymyr Mnih, Koray Kavukcuoglu, David Silver, Alex Graves, Ioannis Antonoglou, Daan Wierstra, and Martin A Riedmiller.
    Arxiv Preprint. abs/1312.5, 2013.
    DISCUSSED ON: 2024-10-04 by Mark Crowley

Other Reading

The following sections list publications that no one needs to volunteer to present, they mostly reference papers and books, simulation environment descriptions, or other resources that may also prove useful in understanding the course topic. potential publications probably won’t be discussed if no one volunteers to present them.

(Not to sign up for, just for reference and interest)
reference ~ foundational ~ environment ~ potential

reference



  1. Artificial intelligence a modern approach
    stuartjrussell.
    Pearson Education, Inc., 2010.
  2. [0] The Art of Reinforcement Learning: Fundamentals, Mathematics, and Implementations with Python
    michaelhu.
    Apress Berkeley, CA, 2023.
  3. RL Book
    [0] Reinforcement Learning: An Introduction
    R.S. Sutton, and A.G. Barto.
    MIT Press, Cambridge, MA. 2018.
    DISCUSSED ON: 2024-09-06 by Prof. Mark Crowley for the first few weeks.

foundational



  1. Natural Actor Critic
    Jan Peters, Sethu Vijayakumar, and Stefan Schaal.
    In European Conference on Machine Learning. Springer Verlag, Berlin, 2005.
  2. Policy Gradient Methods for Reinforcement Learning with Function Approximation
    Richard S Sutton, David McAllester, Satinder Singh, and Yishay Mansour.
    . 12, MIT Press, 1999.
  3. Actor-Critic Algorithms
    Vijay Konda, and John Tsitsiklis.
    In Advances in Neural Information Processing Systems. MIT Press, 1999.
  4. Natural gradient works efficiently in learning.
    samari.
    Neural Computation. 10, 1998.
  5. Neuro-Dymanic Programming
    Dimitri P Bertsekas, and John N Tsitsiklis.
    Athena Scientific, Nashua, NH.. 1996.
  6. Simple statistical gradient-following algorithms for connectionist reinforcement learning
    ronaldjwilliams.
    Machine Learning. 8, (2). 1992.
  7. Neurocontrol and Supervised Learning: An Overview and Evaluation
    paulwerbos.
    1992.
  8. Learning from Delayed Rewards
    cjwatkins.
    UK. 1989.
  9. Modified Policy Iteration Algorithms for Discounted Markov Decision Problems
    Martin L Puterman, and Moon Chirl Shin.
    Management Science. 24, (11). 1978.
  10. Dynamic Programming
    rbellman.
    Princeton University Press, New Jersey. 1957.
  11. RL Book
    [0] Reinforcement Learning: An Introduction
    R.S. Sutton, and A.G. Barto.
    MIT Press, Cambridge, MA. 2018.
    DISCUSSED ON: 2024-09-06 by Prof. Mark Crowley for the first few weeks.

environment



  1. Minerl: A large-scale dataset of minecraft demonstrations
    William H Guss, Brandon Houghton, Nicholay Topin, Phillip Wang, Cayden Codel, Manuela Veloso, and Ruslan Salakhutdinov.
    arXiv preprint arXiv:1907.13440. 2019.
  2. The starcraft multi-agent challenge
    Mikayel Samvelyan, Tabish Rashid, Christian Schroeder De Witt, Gregory Farquhar, Nantas Nardelli, Tim GJ Rudner, Chia-Man Hung, Philip HS Torr, Jakob Foerster, and Shimon Whiteson.
    arXiv preprint arXiv:1902.04043. 2019.
  3. Openai gym
    Greg Brockman, Vicki Cheung, Ludwig Pettersson, Jonas Schneider, John Schulman, Jie Tang, and Wojciech Zaremba.
    arXiv preprint arXiv:1606.01540. 2016.
  4. [19] Deep Hedging with Market Impact
    Andrei Neagu, Frédéric Godin, Clarence Simard, and Leila Kosseim.
    In . Canadian Artificial Intelligence Association (CAIAC), May, 2024.
  5. ChemGymRL
    [19] ChemGymRL: An Interactive Framework for Reinforcement Learning for Digital Chemistry
    Chris Beeler, Sriram Ganapathi Subramanian, Kyle Sprague, Nouha Chatti, Colin Bellinger, Mitchell Shahen, Nicholas Paquin, Mark Baula, Amanuel Dawit, Zihan Yang, and others.
    arXiv preprint arXiv:2305.14177. 2023.

potential



  1. Symphony: Learning Realistic and Diverse Agents for Autonomous Driving Simulation
    Maximilian Igl, Daewoo Kim, Alex Kuefler, Paul Mougin, Punit Shah, Kyriacos Shiarlis, Dragomir Anguelov, Mark Palatucci, Brandyn White, and Shimon Whiteson.
    In International Conference on Robotics and Automation (ICRA). 2022.
  2. Graph Convolutional Networks for Chemical Relation Extraction
    Darshini Mahendran, Christina Tang, and Bridget T. McInnes.
    In Companion Proceedings of the Web Conference 2022. Association for Computing Machinery, New York, NY, USA. Apr, 2022.
  3. UAV Coverage Path Planning under Varying Power Constraints Using Deep Reinforcement Learning
    Mirco Theile, Harald Bayerlein, Richard Nai, David Gesbert, and Marco Caccamo.
    In 2020 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). IEEE, 2020.
  4. Beyond 512 Tokens: Siamese Multi-depth Transformer-based Hierarchical Encoder for Long-Form Document Matching
    Liu Yang, Mingyang Zhang, Cheng Li, Michael Bendersky, and Marc Najork.
    In Proceedings of the 29th ACM International Conference on Information & Knowledge Management. ACM, Virtual Event Ireland. Oct, 2020.
  5. Model-ensemble trust-region policy optimization
    Thanard Kurutach, Ignasi Clavera, Yan Duan, Aviv Tamar, and Pieter Abbeel.
    arXiv preprint arXiv:1802.10592. 2018.
  6. A distributional perspective on reinforcement learning
    Marc G Bellemare, Will Dabney, and Rémi Munos.
    In International conference on machine learning. 2017.
  7. Constrained policy optimization
    Joshua Achiam, David Held, Aviv Tamar, and Pieter Abbeel.
    In International conference on machine learning. 2017.
  8. Continuous control with deep reinforcement learning
    Timothy P Lillicrap, Jonathan J Hunt, Alexander Pritzel, Nicolas Heess, Tom Erez, Yuval Tassa, David Silver, and Daan Wierstra.
    arXiv preprint arXiv:1509.02971. 2015.
  9. Knows What It Knows: A Framework For Self-Aware Learning
    Lihong Li, Michael Littman, and Thomas J Walsh.
    Proceedings of the 25th International Conference on Machine Learning. 2008.
  10. PAC Model-Free Reinforcement Learning
    Alexander L Strehl, Eric Wiewiora, John Langford, and Michael L Littman.
    Update. 2006.
  11. [9] Talking About Large Language Models
    murrayshanahan.
    Arxiv Preprint. Dec, 2022.
  12. [99] Galactica: A Large Language Model for Science
    Ross Taylor, Marcin Kardas, Guillem Cucurull, Thomas Scialom, Anthony Hartshorn, Elvis Saravia, Andrew Poulton, Viktor Kerkez, and Robert Stojnic.
    Arxiv Preprint. 2022.