Affinity Posters

Blog Track Session 4

David Dobre · Leo Schwinn · Claire Vernade · Charlie Gauthier · Fabian Pedregosa · Gauthier Gidel

Halle B

Wed 8 May 7:30 a.m. PDT — 9:30 a.m. PDT

Abstract:

Live content is unavailable. Log in and register to view live content

Schedule

Wed 7:30 a.m. - 9:30 a.m.	Masked Language Model with ALiBi and CLAP head ( Poster #3 ) > link Poster Location: Halle B #3 As a new approach to positional encoding, Attention with Linear Biases (ALiBi) uses linear biases of the attention weights to encode positional information, with capability of context length extrapolation. In their paper however, Press et al. focus on the perplexity of autoregressive decoder-only language models, leaving the question of downstream tasks and its applicability to encoder-attention open. In this blogpost, we attempt to bridge the gap by testing masked language models (MLMs) with encoder-attention ALiBi and prediction head similar to the counterparts of the original ALiBi models. We find that while simplified prediction head may be beneficial, performance of MLMs with encoder-attention ALiBi starts to deteriorate with 2048 sequence length at larger scales. We put our results in the context of related recent experiments and tentatively identify the circumstances more challenging to positional encoding designs. Finally, we open-source our MLMs, with BERT-level performance and 2048 context length. Link	Jason Chou 🔗
Wed 7:30 a.m. - 9:30 a.m.	The N Implementation Details of RLHF with PPO ( Poster #2 ) > link Poster Location: Halle B #2 Reinforcement Learning from Human Feedback (RLHF) is a pivotal technique for training modern language models such as ChatGPT. In this blog post, we explore OpenAI's first RLHF paper from 2019 and its accompanying open-source codebase, available at https://github.com/openai/lm-human-preferences. Our examination shows important implementation details of RLHF that are often overlooked. Moreover, we illustrate how to replicate OpenAI's original Tensorflow 1.0 implementation using contemporary PyTorch and JAX frameworks, offering a minimal reference implementation for RLHF. Link	Shengyi Huang · Tianlin Liu · Leandro Von Werra 🔗
Wed 7:30 a.m. - 9:30 a.m.	Bridging the Data Processing Inequality and Function-Space Variational Inference ( Poster #1 ) > link Poster Location: Halle B #1 This blog post explores the interplay between the Data Processing Inequality (DPI) and Function-Space Variational Inference (FSVI) within Bayesian deep learning and information theory. After examining the DPI, a cornerstone concept in information theory, and its pivotal role in governing the transformation and flow of information through stochastic processes, we employ its unique connection to FSVI to highlight the FSVI's focus on Bayesian predictive posteriors over parameter space. Throughout the post, theoretical concepts are intertwined with intuitive explanations and mathematical rigor, offering a holistic understanding of these complex topics. The post culminates by synthesizing insights into the significance of predictive priors in model training and regularization, shedding light on their practical implications in areas like continual learning and knowledge distillation. This comprehensive examination not only enriches theoretical understanding but also highlights practical applications in machine learning, making it a valuable read for researchers and practitioners. Link	Andreas Kirsch 🔗