VLSP2025-LegalSML/qwen3-4b-legal-pretrain

Hugging Face
TEXT GENERATIONConcurrency Cost:1Model Size:4BQuant:BF16Ctx Length:32kPublished:Jul 2, 2025Architecture:Transformer Warm

VLSP2025-LegalSML/qwen3-4b-legal-pretrain is a 4-billion parameter Vietnamese legal-domain base model, continually pretrained from Qwen-4B. Developed by the VLSP 2025 LegalSLM Task Organizers, it is specifically adapted for legal text understanding and legal question answering tasks in Vietnamese. The model was trained on a curated corpus of approximately 144,000 Vietnamese legal documents and news articles, featuring a maximum sequence length of 4096 tokens, making it highly specialized for legal applications.

Loading preview...

Vietnamese Legal Base Model - Qwen3-4B (Pretrained)

This model, developed by the VLSP 2025 LegalSLM Task Organizers, is a specialized Vietnamese legal-domain base model. It is continually pretrained from the Qwen-4B architecture, adapting it specifically for tasks involving Vietnamese legal text understanding and legal question answering. The model leverages a maximum sequence length of 4096 tokens, making it suitable for processing substantial legal documents.

Key Capabilities

  • Domain-specific understanding: Optimized for the nuances of Vietnamese legal language.
  • Legal text processing: Handles official legal documents, laws, decrees, and legal news articles.
  • Continual pretraining: Enhanced performance on legal tasks through extensive domain adaptation.

Training Details

The model underwent full-parameter fine-tuning on a comprehensive corpus of approximately 144,000 Vietnamese texts. This dataset includes:

  • ~96,000 official legal documents (laws, decrees, circulars).
  • ~48,000 legal news articles and commentary.

Good for

  • Developing applications requiring deep understanding of Vietnamese legal texts.
  • Research in legal AI and natural language processing for the Vietnamese legal domain.
  • Legal question answering systems in Vietnamese.

Note: This model is released for research purposes only under the scope of the VLSP 2025 Evaluation Campaign.