flavianv/qwen4b-apparel23-bundle-sft

TEXT GENERATIONConcurrency Cost:1Model Size:4BQuant:BF16Ctx Length:32kTool Calling:SupportedPublished:Jun 16, 2026License:apache-2.0Architecture:Transformer Open Weights Cold

flavianv/qwen4b-apparel23-bundle-sft is a 4 billion parameter Qwen3-based model fine-tuned by flavianv. It functions as a reward model for the DeepShopper project, specifically designed to generate matching apparel outfit bundles for natural language requests. This model excels at providing in-distribution relative signal for Amazon Apparel'23 gold/kept bundles, acting as a likelihood judge for candidate outfits.

Loading preview...

Model Overview

This model, flavianv/qwen4b-apparel23-bundle-sft, is a 4 billion parameter Qwen3-based language model developed by flavianv. It serves as Reward V0 for the DeepShopper project, specifically fine-tuned to generate appropriate apparel outfit bundles based on natural language descriptions. Its primary role is to act as a likelihood judge for candidate outfits, calculating P(outfit | need) using teacher-forced likelihood with hard gates for invalid, duplicate, or role-category mismatches.

Key Capabilities

  • Outfit Bundle Generation: Generates matching apparel outfit bundles given a natural language need.
  • Reward Signal Calculation: Provides a reward signal for DeepShopper by evaluating the likelihood of a candidate outfit (100·exp(−mean NLL)).
  • In-Distribution Signal: Optimized for providing relative signal within the Amazon Apparel'23 dataset, where it was trained on gold/kept bundles from flavianv/apparel23-qwen32b-kept-outfits-with-products.

Limitations and Use Cases

This model is designed for specific in-distribution tasks related to Amazon Apparel'23 bundles. It is gender-blind and reasoning-blind, meaning it does not account for gender or complex reasoning in its outfit suggestions. Its primary strength lies in acting as an SFT mapper for relative signal rather than a zero-shot model. It has known limitations where likelihood does not equate to overall quality and is superseded for discrimination tasks by the pairwise V1 model (flavianv/qwen4b-reward-pairwise-v1).

Good for

  • Evaluating the likelihood of apparel outfit bundles within the Amazon Apparel'23 context.
  • Providing a reward signal for reinforcement learning systems focused on apparel bundling.
  • Applications requiring a specialized model for generating and judging fashion outfits based on specific product data.