Overview
This model, deepkick/qwen3-4b-advanced-sft-v13-merged, is a 4 billion parameter language model derived from the Qwen/Qwen3-4B-Instruct-2507 base model. It has undergone a LoRA SFT (Supervised Fine-Tuning) process, with the adapter merged directly into the base model for seamless deployment.
Key Capabilities
- Advanced Agentic Task Performance: Specifically fine-tuned using the
u-10bei/sft_alfworld_trajectory_dataset_v5, which focuses on complex trajectory-based tasks, making it suitable for agentic applications. - Optimized for AgentBench: Designed with a particular focus on performance within AgentBench Advanced evaluations, indicating its strength in environments requiring sophisticated decision-making and planning.
- vLLM Compatibility: The LoRA adapter has been merged, and there are no tokenizer vocabulary modifications, ensuring compatibility with vLLM for efficient inference.
Training Details
- Base Model: Qwen/Qwen3-4B-Instruct-2507
- Dataset:
u-10bei/sft_alfworld_trajectory_dataset_v5 - Method: LoRA SFT (merged)
- Max Sequence Length: 4096 tokens
- Epochs: 1
- Learning Rate: 1e-06
- LoRA Configuration: r=32, alpha=128
Good For
- Developers working on AI agents requiring robust performance in complex, trajectory-based environments.
- Researchers and practitioners involved in AgentBench Advanced evaluations.
- Applications demanding a Qwen3-4B variant with enhanced capabilities for structured, sequential reasoning.