hez2024/LLM4Cov-Qwen3-4B-SFT-Stage1

Hugging Face
TEXT GENERATIONConcurrency Cost:1Model Size:4BQuant:BF16Ctx Length:32kPublished:Jan 22, 2026License:otherArchitecture:Transformer Warm

The hez2024/LLM4Cov-Qwen3-4B-SFT-Stage1 model is a 4 billion parameter language model, fine-tuned from hez2024/LLM4Cov-Qwen3-4B-SFT-Stage0. This model specializes in tasks related to the cvdp_ecov_train_stage1 dataset, indicating an optimization for specific domain-related language understanding and generation. With a 32768 token context length, it is designed for processing extensive textual inputs within its specialized domain.

Loading preview...

Model Overview

hez2024/LLM4Cov-Qwen3-4B-SFT-Stage1 is a 4 billion parameter language model, representing a further fine-tuned iteration of the hez2024/LLM4Cov-Qwen3-4B-SFT-Stage0 base model. This model has undergone supervised fine-tuning (SFT) specifically on the cvdp_ecov_train_stage1 dataset, suggesting a specialization in tasks pertinent to this dataset's content.

Training Details

The fine-tuning process involved specific hyperparameters aimed at optimizing performance on the target dataset:

  • Learning Rate: 1e-05
  • Batch Sizes: train_batch_size of 1, eval_batch_size of 8, with a total_train_batch_size of 24 (achieved with 4 devices and 6 gradient accumulation steps).
  • Optimizer: ADAMW_TORCH with standard betas and epsilon.
  • Scheduler: Cosine learning rate scheduler with a 0.03 warmup ratio.
  • Epochs: Trained for 1.0 epoch.

Key Characteristics

  • Parameter Count: 4 billion parameters.
  • Context Length: Supports a substantial context window of 32768 tokens.
  • Specialization: Fine-tuned on cvdp_ecov_train_stage1, indicating potential strengths in tasks related to this specific data domain.

When to Use This Model

This model is particularly suited for use cases that align with the data distribution and tasks present in the cvdp_ecov_train_stage1 dataset. Its fine-tuned nature suggests improved performance for domain-specific applications compared to its base model or more general-purpose LLMs of similar size, especially when long context understanding is required.