oro-ai/qwen3-4b-shoppingbench-kto

TEXT GENERATIONConcurrency Cost:1Model Size:4BQuant:BF16Ctx Length:32kTool Calling:SupportedPublished:Jun 8, 2026License:apache-2.0Architecture:Transformer Open Weights Cold

The oro-ai/qwen3-4b-shoppingbench-kto model is a 4 billion parameter Qwen3-based language model developed by ORO-AI, fine-tuned with KTO preference refinement. It is specifically optimized for agentic tasks within the ShoppingBench environment, achieving a 42.7% ASR on a production-strict held-out partition. This model is designed for distilling shopping agent behaviors from trajectory primitives, making it suitable for automated shopping and agent simulation tasks.

Loading preview...

Overview

oro-ai/qwen3-4b-shoppingbench-kto is a 4 billion parameter language model built upon the Qwen3 architecture, developed by ORO-AI. This model has undergone KTO (Kahneman-Tversky Optimization) preference refinement, specifically tailored for agentic tasks within the ShoppingBench environment. It serves as a companion artifact for the paper "Bittensor Agent Arenas as a Trajectory Primitive: Distilling a Shopping Agent from ShoppingBench Subnet Traces".

Key Capabilities

  • Specialized for ShoppingBench: Achieves a 42.7% ASR (Agent Success Rate) on a leak-cluster-guarded, production-strict held-out partition of ShoppingBench, significantly improving upon the base Qwen3-4B's 18.0% ASR.
  • KTO Refinement: Utilizes KTO preference refinement (v3) on top of a merged SFT champion model, enhancing its performance in specific agentic scenarios.
  • Trajectory Primitive Distillation: Designed to distill shopping agent behaviors from ShoppingBench subnet traces, making it adept at understanding and executing complex shopping-related actions.
  • Ready-to-Use: Provided as a merged full model, allowing direct loading with transformers or serving with vLLM without requiring adapter stacking.

Training Data

The model was trained using a filtered corpus from oro-ai/sn15-shoppingbench-sft-15k and raw traces from oro-ai/sn15-shoppingbench-traces-18k.

Good For

  • Developing and evaluating automated shopping agents.
  • Research into agentic AI and trajectory primitive distillation.
  • Applications requiring specialized language understanding and generation for e-commerce and online shopping interactions.