deepkick/qwen3-4b-advanced-sft-v13-merged

Hugging Face
TEXT GENERATIONConcurrency Cost:1Model Size:4BQuant:BF16Ctx Length:32kPublished:Feb 24, 2026License:apache-2.0Architecture:Transformer Open Weights Warm

deepkick/qwen3-4b-advanced-sft-v13-merged is a 4 billion parameter language model, fine-tuned from Qwen/Qwen3-4B-Instruct-2507. This merged model is specifically optimized for advanced agentic tasks, leveraging a LoRA SFT method on the u-10bei/sft_alfworld_trajectory_dataset_v5. It is intended for use in AgentBench Advanced evaluations, offering enhanced performance for complex trajectory-based scenarios.

Loading preview...

Overview

This model, deepkick/qwen3-4b-advanced-sft-v13-merged, is a 4 billion parameter language model derived from the Qwen/Qwen3-4B-Instruct-2507 base model. It has undergone a LoRA SFT (Supervised Fine-Tuning) process, with the adapter merged directly into the base model for seamless deployment.

Key Capabilities

  • Advanced Agentic Task Performance: Specifically fine-tuned using the u-10bei/sft_alfworld_trajectory_dataset_v5, which focuses on complex trajectory-based tasks, making it suitable for agentic applications.
  • Optimized for AgentBench: Designed with a particular focus on performance within AgentBench Advanced evaluations, indicating its strength in environments requiring sophisticated decision-making and planning.
  • vLLM Compatibility: The LoRA adapter has been merged, and there are no tokenizer vocabulary modifications, ensuring compatibility with vLLM for efficient inference.

Training Details

  • Base Model: Qwen/Qwen3-4B-Instruct-2507
  • Dataset: u-10bei/sft_alfworld_trajectory_dataset_v5
  • Method: LoRA SFT (merged)
  • Max Sequence Length: 4096 tokens
  • Epochs: 1
  • Learning Rate: 1e-06
  • LoRA Configuration: r=32, alpha=128

Good For

  • Developers working on AI agents requiring robust performance in complex, trajectory-based environments.
  • Researchers and practitioners involved in AgentBench Advanced evaluations.
  • Applications demanding a Qwen3-4B variant with enhanced capabilities for structured, sequential reasoning.