Name: SumiYama/dpo-qwen-cot-merged API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: SumiYama

Model Overview

SumiYama/dpo-qwen-cot-merged is a specialized language model built upon the Qwen3-4B-Instruct-2507 base model. It has been fine-tuned using a LoRA SFT (Supervised Fine-Tuning) approach, with specific LoRA settings of r=16 and alpha=32, before merging the adapters.

Key Capabilities

Agent Task Specialization: Optimized for agent-based interactions.
SQL Task Handling: Proficient in processing and generating responses for DB_Bench-formatted SQL tasks.
Household Task Execution: Capable of understanding and responding to ALFWorld-formatted household tasks.

Training Details

The model was trained on synthetic dialogue data specifically designed for its target tasks:

Synthetic SQL agent dialogues in the DB_Bench format.
Synthetic household task dialogues in the ALFWorld format.

Notably, this model does not utilize AgentBench data for its training, focusing instead on its custom synthetic datasets for specialized performance in SQL and ALFWorld contexts.

Deployment

It can be deployed using vLLM, with an example provided for Docker, specifying a maximum model length of 8192 and 95% GPU memory utilization.

Overview

Model Overview

Key Capabilities

Training Details

Deployment

Full Model Card (README)