Seminar: BERT and Friends - Pretrained LMs in Computational Semantics

Monday, 12:15 - 13:45

April 25 - July 18
This is the seminar edition from 2022. Looking for the Bert and Friends seminar in summer semester 2023? Please have a look here: Bert and Friends '23

Seminar Content

The advent of large-scale pretrained language models as "Swiss Army Knives" for various applications and problems in natural language understanding and computational semantics has drastically changed the natural language processing landscape.

The BERT model is only the most prominent example. The publication of BERT caused a huge impact in the NLP research community and basically lead to a paradigm change: Pretraining language models based on large text collections and then adapting them to a task at hand has become the most prominent procedure for cutting edge and state of the art systems both in research and for industry applications.

Since the birth of BERT, research has been flourishing that is targeted at finding smaller, faster and more accurate variants and that investigates the adaptation of BERT-like transformer models to new tasks. In this seminar, we will look at such variants and adaptations of pretrained language models. We will cover papers on diverse new and effective pretraining methods for such language models, as well as papers that investigate how to use and adapt pretrained models for selected tasks in natural language understanding and computational semantics. We will look into prominent use cases such as machine reading comprehension and open question answering, but also read papers on multilinguality, natural language inference or text classification (based on the interest of participants).


Selection of Relevant Papers

Pretraining
  • XLNet: Generalized Autoregressive Pretraining for Language Understanding | paper
  • ALBERT: A Lite BERT for Self-Supervised Learning of Language Representations | paper
  • RoBERTa: A Robustly Optimized BERT Pretraining Approach | paper
  • GPT-2: Language Models are Unsupervised Multitask Learners | paper
  • GPT-3: Language Models are Few-Shot Learners | paper
  • T5: Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer | paper
  • ELECTRA: Pre-training Text Encoders as Discriminators Rather Than Generators | paper
  • BART: Denoising Sequence-to-Sequence Pre-training for Natural Language Generation, Translation, and Comprehension | paper
  • SpanBERT: Improving Pre-training by Representing and Predicting Spans | paper
Tasks
Open Question Answering/Neural Retrieval
  • Dense Passage Retrieval for Open-Domain Question Answering | paper
  • How Much Knowledge Can You Pack Into the Parameters of a Language Model? | paper
  • RocketQA: An Optimized Training Approach to Dense Passage Retrieval for Open-Domain Question Answering | paper
  • Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks | paper
  • REALM: Retrieval-Augmented Language Model Pre-Training | paper
Machine Comprehension
  • TANDA: Transfer and Adapt Pre-Trained Transformer Models for Answer Sentence Selection | paper
  • The Cascade Transformer: an Application for Efficient Answer Sentence Selection | paper
  • Retrospective Reader for Machine Reading Comprehension | paper
Natural Language Inference, Entailment and Similarity
  • Semantics-aware BERT for Language Understanding | paper
  • Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks | paper
  • Making Monolingual Sentence Embeddings Multilingual using Knowledge Distillation | paper
  • Multi-Task Deep Neural Networks for Natural Language Understanding | paper
  • SimCSE: Simple Contrastive Learning of Sentence Embeddings | paper
Multilingual Models
  • mT5: A Massively Multilingual Pre-trained Text-to-Text Transformer paper
  • Unsupervised Cross-lingual Representation Learning at Scale paper
  • Multilingual Denoising Pre-training for Neural Machine Translation paper

Some words on grading: This seminar is meant to be as interactive as possible. Final grades will be based on students' presentations, term papers (optional), but also on participation and discussion in class.

The participants are expected to prepare for classes accordingly, by reading the relevant papers and also doing background reading, if necessary. Based on this preparation, the participants should be able to discuss the presented papers in depth and to understand relevant context during the discussion.