VLSP2025-LegalSML/qwen3-4b-legal-pretrain
VLSP2025-LegalSML/qwen3-4b-legal-pretrain is a 4-billion parameter Vietnamese legal-domain base model, continually pretrained from Qwen-4B. Developed by the VLSP 2025 LegalSLM Task Organizers, it is specifically adapted for legal text understanding and legal question answering tasks in Vietnamese. The model was trained on a curated corpus of approximately 144,000 Vietnamese legal documents and news articles, featuring a maximum sequence length of 4096 tokens, making it highly specialized for legal applications.
Loading preview...
Vietnamese Legal Base Model - Qwen3-4B (Pretrained)
This model, developed by the VLSP 2025 LegalSLM Task Organizers, is a specialized Vietnamese legal-domain base model. It is continually pretrained from the Qwen-4B architecture, adapting it specifically for tasks involving Vietnamese legal text understanding and legal question answering. The model leverages a maximum sequence length of 4096 tokens, making it suitable for processing substantial legal documents.
Key Capabilities
- Domain-specific understanding: Optimized for the nuances of Vietnamese legal language.
- Legal text processing: Handles official legal documents, laws, decrees, and legal news articles.
- Continual pretraining: Enhanced performance on legal tasks through extensive domain adaptation.
Training Details
The model underwent full-parameter fine-tuning on a comprehensive corpus of approximately 144,000 Vietnamese texts. This dataset includes:
- ~96,000 official legal documents (laws, decrees, circulars).
- ~48,000 legal news articles and commentary.
Good for
- Developing applications requiring deep understanding of Vietnamese legal texts.
- Research in legal AI and natural language processing for the Vietnamese legal domain.
- Legal question answering systems in Vietnamese.
Note: This model is released for research purposes only under the scope of the VLSP 2025 Evaluation Campaign.