Spring 2023 Reading Group

Reading group on Transformers in the lab in Spring 2023

Reading Groups Tips

In a reading group everyone takes turns leading discussion of a paper each week. Leading discussion can be as simple as having your own annotated notes on Hypothes.is to share and start discussion as we go through it together. Or it could be more involved, including making slides to present your overview of the paper’s contributions, highlights and weak points.


Spring 2023 - Transformers Reading Group

Motivation

AI research has been undergoing since the dawn of computer science itself, and Deep Learning has seen an uninterrupted, and accelerating wave of advancing abilities for over 12 years since the public breakthroughs of CNNs in 2012. Yet still, many people, including AI/ML researchers have been surprised at the abilities of the generative models that have been released since summer 2022 by OpenAI, Facebook, Google and others. The recent systems all rely in various ways on the Transformer model (missing reference).

Resources

This github page has a quite extensive list of papers and references on the topic so seems as good a place as any to start:


See the links and notes on paper we have done in previous meetings, obtain the link for the next paper or look at planned upcoming or potential future papers, feel free to suggest others or changes in the upcoming order.

Jump to stage: next ~ done ~ upcoming ~ potential

next



                      done



                          1. [8] Harnessing the Power of LLMs in Practice: A Survey on ChatGPT and Beyond
                            Jingfeng Yang, Hongye Jin, Ruixiang Tang, Xiaotian Han, Qizhang Feng, Haoming Jiang, Bing Yin, and Xia Hu.
                            Arxiv Preprint. 2023.
                          1. [7] Are Pretrained Convolutions Better than Pretrained Transformers?
                            Yi Tay, Mostafa Dehghani, Jai Prakash Gupta, Vamsi Aribandi, Dara Bahri, Zhen Qin, and Donald Metzler.
                            In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers). Association for Computational Linguistics, Online.. Aug, 2021.
                          1. [6] LaMDA: Language Models for Dialog Applications
                            Romal Thoppilan, Daniel De Freitas, Jamie Hall, Noam Shazeer, Apoorv Kulshreshtha, Heng-Tze Cheng, Alicia Jin, Taylor Bos, Leslie Baker, Yu Du, YaGuang Li, Hongrae Lee, Huaixiu Steven Zheng, Amin Ghafouri, Marcelo Menegali, Yanping Huang, Maxim Krikun, Dmitry Lepikhin, James Qin, Dehao Chen, Yuanzhong Xu, Zhifeng Chen, Adam Roberts, Maarten Bosma, Vincent Zhao, Yanqi Zhou, Chung-Ching Chang, Igor Krivokon, Will Rusch, Marc Pickett, Pranesh Srinivasan, Laichee Man, Kathleen Meier-Hellstern, Meredith Ringel Morris, Tulsee Doshi, Renelito Delos Santos, Toju Duke, Johnny Soraker, Ben Zevenbergen, Vinodkumar Prabhakaran, Mark Diaz, Ben Hutchinson, Kristen Olson, Alejandra Molina, Erin Hoffman-John, Josh Lee, Lora Aroyo, Ravi Rajakumar, Alena Butryna, Matthew Lamm, Viktoriya Kuzmina, Joe Fenton, Aaron Cohen, Rachel Bernstein, Ray Kurzweil, Blaise Aguera-Arcas, Claire Cui, Marian Croak, Ed Chi, and Quoc Le.
                            Arxiv Preprint. 2022.
                          1. [5] Language Models are Few-Shot Learners
                            Tom Brown, Benjamin Mann, Nick Ryder, Melanie Subbiah, Jared D Kaplan, Prafulla Dhariwal, Arvind Neelakantan, Pranav Shyam, Girish Sastry, Amanda Askell, Sandhini Agarwal, Ariel Herbert-Voss, Gretchen Krueger, Tom Henighan, Rewon Child, Aditya Ramesh, Daniel Ziegler, Jeffrey Wu, Clemens Winter, Chris Hesse, Mark Chen, Eric Sigler, Mateusz Litwin, Scott Gray, Benjamin Chess, Jack Clark, Christopher Berner, Sam McCandlish, Alec Radford, Ilya Sutskever, and Dario Amodei.
                            In Advances in Neural Information Processing Systems. Virtual.. 2020.
                          2. [5] Language Models are Unsupervised Multitask Learners
                            Alec Radford, Jeff Wu, Rewon Child, David Luan, Dario Amodei, and Ilya Sutskever.
                            In . 2019.
                          3. [5] Bart: Denoising sequence-to-sequence pre-training for natural language generation, translation, and comprehension
                            Mike Lewis, Yinhan Liu, Naman Goyal, Marjan Ghazvininejad, Abdelrahman Mohamed, Omer Levy, Ves Stoyanov, and Luke Zettlemoyer.
                            arXiv preprint arXiv:1910.13461. 2019.
                          1. [4] BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
                            Jacob Devlin, Ming-Wei Chang, Kenten Lee, and Kristina Toutanova.
                            In Proceedings of NAACL-HLT. 2019.
                          1. [3] RoBERTa: A Robustly Optimized BERT Pretraining Approach
                            Yinhan Liu, Myle Ott, Naman Goyal, Jingfei Du, Mandar Joshi, Danqi Chen, Omer Levy, Mike Lewis, Luke Zettlemoyer, and Veselin Stoyanov.
                            . Jul, 2019.
                          2. [3] Improving Language Understanding by Generative Pre-Training
                            Alec Radford, Karthik Narasimhan, Tim Salimans, and Ilya Sutskever.
                            Preprint. 2018.
                          1. Trnsfrmr
                            [2] Attention is All you Need
                            Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, Łukasz Kaiser, and Illia Polosukhin.
                            In Advances in Neural Information Processing Systems. Long Beach California, USA. Dec, 2017.
                          1. [1] Attention Mechanism, Transformers, BERT, and GPT: Tutorial and Survey
                            Dec, 2020.

                          upcoming



                                              potential



                                                1. [9] Talking About Large Language Models
                                                  murrayshanahan.
                                                  Arxiv Preprint. Dec, 2022.
                                                2. [99] Galactica: A Large Language Model for Science
                                                  Ross Taylor, Marcin Kardas, Guillem Cucurull, Thomas Scialom, Anthony Hartshorn, Elvis Saravia, Andrew Poulton, Viktor Kerkez, and Robert Stojnic.
                                                  Arxiv Preprint. 2022.