vi-gemma-2b-RAG: Vietnamese Language Model for RAG
vi-gemma-2b-RAG is a 2.6 billion parameter language model developed by hiieu, himmeow the coder, and cuctrinh. It is fine-tuned from the google/gemma-1.1-2b-it base model using LoRA (Low-Rank Adaptation) and PEFT with Unsloth, specifically targeting enhanced performance in Vietnamese language tasks.
Key Capabilities
- Vietnamese Language Processing: Significantly improved capabilities for understanding and generating Vietnamese text.
- Retrieval Augmented Generation (RAG): Optimized for tasks requiring information retrieval from a given context.
- Question Answering: Proficient in answering questions based on provided Vietnamese context.
- Text Summarization: Capable of summarizing Vietnamese documents.
- Machine Translation: Supports translation tasks involving Vietnamese.
Training Details
The model was fine-tuned on the lamhieu/mabrycodes_dialogue_vi dataset. The use of Unsloth for training enabled a 2x faster fine-tuning process compared to standard methods. While designed for Vietnamese, users should be aware of potential limitations such as generating inaccurate information or exhibiting biases, and performance dependency on input data quality.
Good For
- Applications requiring robust Vietnamese natural language understanding and generation.
- Building RAG systems for Vietnamese content.
- Developing chatbots or virtual assistants that interact in Vietnamese.
- Research and development in Vietnamese NLP.