SumiYama/dpo-qwen-cot-merged
SumiYama/dpo-qwen-cot-merged is a Qwen3-4B-Instruct-2507 based language model developed by SumiYama, fine-tuned using LoRA for specific agent-based tasks. This model specializes in handling DB_Bench (SQL) and ALFWorld (household task) formats, making it suitable for applications requiring structured interaction and task execution. It leverages a merged LoRA SFT approach to enhance performance on these targeted agent tasks.
Loading preview...
Model Overview
SumiYama/dpo-qwen-cot-merged is a specialized language model built upon the Qwen3-4B-Instruct-2507 base model. It has been fine-tuned using a LoRA SFT (Supervised Fine-Tuning) approach, with specific LoRA settings of r=16 and alpha=32, before merging the adapters.
Key Capabilities
- Agent Task Specialization: Optimized for agent-based interactions.
- SQL Task Handling: Proficient in processing and generating responses for DB_Bench-formatted SQL tasks.
- Household Task Execution: Capable of understanding and responding to ALFWorld-formatted household tasks.
Training Details
The model was trained on synthetic dialogue data specifically designed for its target tasks:
- Synthetic SQL agent dialogues in the DB_Bench format.
- Synthetic household task dialogues in the ALFWorld format.
Notably, this model does not utilize AgentBench data for its training, focusing instead on its custom synthetic datasets for specialized performance in SQL and ALFWorld contexts.
Deployment
It can be deployed using vLLM, with an example provided for Docker, specifying a maximum model length of 8192 and 95% GPU memory utilization.