deepkick/qwen3-4b-advanced-sft-v13-merged
TEXT GENERATIONConcurrency Cost:1Model Size:4BQuant:BF16Ctx Length:32kPublished:Feb 24, 2026License:apache-2.0Architecture:Transformer Open Weights Warm
deepkick/qwen3-4b-advanced-sft-v13-merged is a 4 billion parameter language model, fine-tuned from Qwen/Qwen3-4B-Instruct-2507. This merged model is specifically optimized for advanced agentic tasks, leveraging a LoRA SFT method on the u-10bei/sft_alfworld_trajectory_dataset_v5. It is intended for use in AgentBench Advanced evaluations, offering enhanced performance for complex trajectory-based scenarios.
Loading preview...
Overview
This model, deepkick/qwen3-4b-advanced-sft-v13-merged, is a 4 billion parameter language model derived from the Qwen/Qwen3-4B-Instruct-2507 base model. It has undergone a LoRA SFT (Supervised Fine-Tuning) process, with the adapter merged directly into the base model for seamless deployment.
Key Capabilities
- Advanced Agentic Task Performance: Specifically fine-tuned using the
u-10bei/sft_alfworld_trajectory_dataset_v5, which focuses on complex trajectory-based tasks, making it suitable for agentic applications. - Optimized for AgentBench: Designed with a particular focus on performance within AgentBench Advanced evaluations, indicating its strength in environments requiring sophisticated decision-making and planning.
- vLLM Compatibility: The LoRA adapter has been merged, and there are no tokenizer vocabulary modifications, ensuring compatibility with vLLM for efficient inference.
Training Details
- Base Model: Qwen/Qwen3-4B-Instruct-2507
- Dataset:
u-10bei/sft_alfworld_trajectory_dataset_v5 - Method: LoRA SFT (merged)
- Max Sequence Length: 4096 tokens
- Epochs: 1
- Learning Rate: 1e-06
- LoRA Configuration: r=32, alpha=128
Good For
- Developers working on AI agents requiring robust performance in complex, trajectory-based environments.
- Researchers and practitioners involved in AgentBench Advanced evaluations.
- Applications demanding a Qwen3-4B variant with enhanced capabilities for structured, sequential reasoning.