Vietnamese Legal Base Model - Qwen3-4B (Pretrained)

This model, developed by the VLSP 2025 LegalSLM Task Organizers, is a specialized Vietnamese legal-domain base model. It is continually pretrained from the Qwen-4B architecture, adapting it specifically for tasks involving Vietnamese legal text understanding and legal question answering. The model leverages a maximum sequence length of 4096 tokens, making it suitable for processing substantial legal documents.

Key Capabilities

Domain-specific understanding: Optimized for the nuances of Vietnamese legal language.
Legal text processing: Handles official legal documents, laws, decrees, and legal news articles.
Continual pretraining: Enhanced performance on legal tasks through extensive domain adaptation.

Training Details

The model underwent full-parameter fine-tuning on a comprehensive corpus of approximately 144,000 Vietnamese texts. This dataset includes:

~96,000 official legal documents (laws, decrees, circulars).
~48,000 legal news articles and commentary.

Good for

Developing applications requiring deep understanding of Vietnamese legal texts.
Research in legal AI and natural language processing for the Vietnamese legal domain.
Legal question answering systems in Vietnamese.

Note: This model is released for research purposes only under the scope of the VLSP 2025 Evaluation Campaign.

Overview

Vietnamese Legal Base Model - Qwen3-4B (Pretrained)

Key Capabilities

Training Details

Good for

Full Model Card (README)