flavianv/deepoutfit-qwen17b-sft-dpo

Hugging Face
TEXT GENERATIONConcurrency Cost:1Model Size:2BQuant:BF16Ctx Length:32kPublished:May 29, 2026License:apache-2.0Architecture:Transformer Open Weights Warm

The flavianv/deepoutfit-qwen17b-sft-dpo is an experimental 2 billion parameter Qwen3-1.7B model, fine-tuned for JSON-action outfit recommendation traces. It specializes in generating structured outfit reports by selecting five products from a catalog. This model is designed for research into catalog-grounded fashion/outfit agents, requiring an external tool loop for product search and validation.

Loading preview...

DeepOutfit Qwen1.7B SFT+DPO Overview

This model, flavianv/deepoutfit-qwen17b-sft-dpo, is an experimental 2 billion parameter full-weight fine-tune of Qwen/Qwen3-1.7B. Its primary purpose is to generate structured JSON-action outfit recommendation traces, specifically for fashion/outfit agents that interact with a product catalog.

Key Capabilities & Training

  • Specialized Fine-tuning: The model underwent Supervised Fine-Tuning (SFT) and Direct Preference Optimization (DPO) stages using filtered JSON-action outfit rollouts.
  • Outfit Report Generation: It is designed to search a product catalog, select five products, and produce a structured final outfit report.
  • Context Length: Trained with a maximum length of 16,384 tokens during SFT and 8,192 tokens during DPO.
  • Performance Improvement: Evaluation against the zero-shot Qwen3-1.7B showed an improvement in GPT-4.1 judged outfit quality, with a mean judge score of 41.58 compared to 29.93 for the base model.

Intended Use Cases & Limitations

  • Research on Fashion Agents: Ideal for research and development of catalog-grounded fashion agents that require structured output for outfit recommendations.
  • External Tooling Required: It is not a general-purpose shopping assistant; it requires an external tool loop for product search results and a validator/scorer for the final JSON report.
  • Experimental Status: This is an experimental research checkpoint, not validated for production. It may produce incomplete, impractical, or unsupported product combinations.
  • Specific Optimization: Optimized for outfit/product-report behavior, not broad assistant quality. The dominant remaining failure mode identified is outfit practicality.