Model Overview
hez2024/LLM4Cov-Qwen3-4B-SFT-Stage1 is a 4 billion parameter language model, representing a further fine-tuned iteration of the hez2024/LLM4Cov-Qwen3-4B-SFT-Stage0 base model. This model has undergone supervised fine-tuning (SFT) specifically on the cvdp_ecov_train_stage1 dataset, suggesting a specialization in tasks pertinent to this dataset's content.
Training Details
The fine-tuning process involved specific hyperparameters aimed at optimizing performance on the target dataset:
- Learning Rate: 1e-05
- Batch Sizes:
train_batch_size of 1, eval_batch_size of 8, with a total_train_batch_size of 24 (achieved with 4 devices and 6 gradient accumulation steps). - Optimizer: ADAMW_TORCH with standard betas and epsilon.
- Scheduler: Cosine learning rate scheduler with a 0.03 warmup ratio.
- Epochs: Trained for 1.0 epoch.
Key Characteristics
- Parameter Count: 4 billion parameters.
- Context Length: Supports a substantial context window of 32768 tokens.
- Specialization: Fine-tuned on
cvdp_ecov_train_stage1, indicating potential strengths in tasks related to this specific data domain.
When to Use This Model
This model is particularly suited for use cases that align with the data distribution and tasks present in the cvdp_ecov_train_stage1 dataset. Its fine-tuned nature suggests improved performance for domain-specific applications compared to its base model or more general-purpose LLMs of similar size, especially when long context understanding is required.