GRPO-VI-Qwen2-7B-RAG: Specialized for Vietnamese RAG
GRPO-VI-Qwen2-7B-RAG is a 7.6 billion parameter large language model, fine-tuned by AITeamVN from the Qwen2.5-7B-Instruct base model. Its primary focus is on Retrieval-Augmented Generation (RAG) tasks, with a strong emphasis on the Vietnamese language. The model was developed using a two-stage training process: Supervised Fine-Tuning and Group Relative Policy Optimization (GRPO).
Key Capabilities
- Enhanced RAG Performance: Specifically fine-tuned to excel in RAG-related tasks such as multi-hop reasoning, negative filtering, information integration, and positive/negative identification.
- Vietnamese Language Proficiency: Trained on a dedicated Vietnamese dataset to improve understanding and generation in Vietnamese.
- STEM and General QA: Retains strong capabilities in STEM tasks (mathematics and coding) and general question answering.
- Conversational Ability: Maintains conversational functionality with a context length of up to 8192 tokens.
Performance Highlights
Evaluated on a custom human-annotated RAG dataset (EvalRAGData), GRPO-VI-Qwen2-7B-RAG achieved a score of 9.24, outperforming several other models including Qwen2.5-7B-Instruct (8.06) and Llama3.1 (7.55). It also shows competitive performance on the VMLU leaderboard, with an average score of 57.4.
Training Methodology
The model was trained with 10K RAG samples and 30K conversational samples (math and general domain) for Supervised Fine-Tuning, followed by GRPO training on 10K RAG samples and 3K math/code samples. Reward scoring during training considered factors like formatting, reasoning length, answer length, Vietnamese language purity, and semantic quality for RAG and STEM tasks.
Potential Use Cases
This model is ideal for applications requiring accurate information retrieval and generation in Vietnamese, particularly in contexts demanding robust RAG capabilities, mathematical problem-solving, or general knowledge queries.