yusei926/qwen3-4b-sft-merged-v2-20260207-1148
The yusei926/qwen3-4b-sft-merged-v2-20260207-1148 is a 4 billion parameter language model, derived from the Qwen3 architecture and fine-tuned for instruction following. This model is a 16-bit merged version based on unsloth/Qwen3-4B-Instruct-2507, specifically utilizing Supervised Fine-Tuning (SFT) with a LoRA rank of 64. It is designed for general instruction-based tasks, leveraging its 40960 token context length for comprehensive understanding and generation.
Loading preview...
Model Overview
The yusei926/qwen3-4b-sft-merged-v2-20260207-1148 is a 4 billion parameter language model built upon the Qwen3 architecture. This specific iteration is a 16-bit merged model, originating from unsloth/Qwen3-4B-Instruct-2507, and has undergone Supervised Fine-Tuning (SFT).
Key Capabilities
- Instruction Following: The model is primarily fine-tuned for instruction-based tasks, making it suitable for a wide range of conversational and command-driven applications.
- Context Handling: With a substantial context length of 40960 tokens, it can process and generate longer sequences of text, aiding in complex interactions and detailed content creation.
- Efficiency: As a 4B parameter model, it offers a balance between performance and computational efficiency, potentially allowing for more accessible deployment compared to larger models.
Training Details
The model's training involved Supervised Fine-Tuning (SFT) with specific parameters:
- Learning Rate (LR): 5e-05
- Epochs: 2
- LoRA Configuration: A LoRA rank of 64 was used during fine-tuning, indicating an efficient adaptation method.
Notably, Deep Reinforcement Learning from Human Feedback (DPO) was disabled during its training process, focusing solely on the SFT phase for instruction alignment.