Poster
in
Workshop: 2nd Workshop on Navigating and Addressing Data Problems for Foundation Models (DATA-FM)

Improving Multimodal Large Language Models in Low-Resource Language Contexts

Yufei Gao · Feijiaying · Guohang Yan · Yunshi Lan

Project Page [ OpenReview]

Abstract

In recent years, open-source Multimodal Large Language Models (MLLM) have developed rapidly, but their strengths remain primarily in mainstream languages such as English and Chinese. Due to the relative scarcity of data for non-mainstream languages, these models perform poorly in low-resource languages, struggling not only to understand and generate them fluently but also to grasp the knowledge familiar to their speakers. Recognizing the importance of low-resource language data, this paper collects multimodal data containing small-language knowledge from relevant websites. Moreover, we propose a two-stage training approach to improving multimodal large language models in low-resource language contexts. In the first stage, multimodal capabilities are transferred to low-resource languages, while the second stage further supplements the model with the knowledge in the collected dataset. Experimental results demonstrate that this data collection strategy and training method effectively extend MLLM's multimodal capabilities to low-resource languages and enable multimodal large models to perform better in such contexts.

Video

Chat is not available.