[6:00]
GPViT: A High Resolution Non-Hierarchical Vision Transformer with Group Propagation
[6:20]
Is Reinforcement Learning (Not) for Natural Language Processing: Benchmarks, Baselines, and Building Blocks for Natural Language Policy Optimization
[6:30]
Human Motion Diffusion Model
[6:40]
NTFields: Neural Time Fields for Physics-Informed Robot Motion Planning
[6:50]
UNIFIED-IO: A Unified Model for Vision, Language, and Multi-modal Tasks
[7:00]
Mass-Editing Memory in a Transformer
[7:10]
On the Usefulness of Embeddings, Clusters and Strings for Text Generation Evaluation