The takami2022/qwen3-4b-sft-merged-v2v5ver1 is a 4 billion parameter Qwen3-based instruction-tuned causal language model. Developed by takami2022, it was fine-tuned using QLoRA (4-bit) and subsequently merged into a 16-bit model, making it fully self-contained without requiring adapter loading. This model is specifically designed as a strong base for further DPO (Direct Preference Optimization) training, leveraging a structured data merged dataset.
Loading preview...
Model Overview
This model, takami2022/qwen3-4b-sft-merged-v2v5ver1, is a 4 billion parameter instruction-tuned language model based on the Qwen3 architecture. It was developed by takami2022 through a fine-tuning process using QLoRA (4-bit) with Unsloth, and the resulting LoRA adapter was then merged into the base model weights to create a fully self-contained 16-bit model.
Key Characteristics
- Base Model: Utilizes
Qwen/Qwen3-4B-Instruct-2507as its foundation. - Training Method: Fine-tuned with QLoRA (4-bit) and subsequently merged to 16-bit, eliminating the need for external adapter loading.
- Dataset: Trained on
takami2022/structured_data_merged_v2v5_0222. - Training Configuration: Employed a maximum sequence length of 1024, 3 epochs, a learning rate of 1e-06, and LoRA parameters of r=64, alpha=128, with CoT (Chain-of-Thought) masking enabled.
Intended Use
This model is primarily intended as a strong starting point for subsequent DPO (Direct Preference Optimization) training. Its self-contained nature simplifies deployment for further fine-tuning stages.