oro-ai/qwen3-4b-shoppingbench-rejection

TEXT GENERATIONConcurrency Cost:1Model Size:4BQuant:BF16Ctx Length:32kTool Calling:SupportedPublished:Jun 8, 2026License:apache-2.0Architecture:Transformer Open Weights Cold

The oro-ai/qwen3-4b-shoppingbench-rejection model is a 4 billion parameter Qwen3-based language model, fine-tuned using reward-weighted rejection sampling. Developed by ORO-AI, it is the second stage in the ShoppingBench distillation pipeline, specifically optimized for agentic shopping tasks. This model significantly improves the Agent Success Rate (ASR) on ShoppingBench, making it suitable for applications requiring robust automated shopping agent performance.

Loading preview...

Overview

This model, oro-ai/qwen3-4b-shoppingbench-rejection, is a 4 billion parameter variant of the Qwen3 architecture, developed by ORO-AI. It represents the second stage of fine-tuning within the ShoppingBench distillation pipeline, utilizing reward-weighted rejection sampling. The primary goal of this fine-tuning is to enhance the model's performance in agentic shopping scenarios.

Key Capabilities

  • Enhanced Agent Success Rate (ASR): The model achieves a 42.7% ASR on ShoppingBench, a substantial improvement over the base Qwen3-4B's 18.0% ASR. This metric is evaluated on a leak-cluster-guarded, held-out partition with production-strict scoring.
  • Rejection-Sampled Fine-tuning: It leverages reward-weighted rejection sampling, a technique designed to distill high-performing agent trajectories.
  • Ready-to-Use: This is a merged full model, meaning the Qwen3-4B base weights are integrated with the trained delta, allowing direct loading with transformers or serving with vLLM without requiring adapter stacking.

Training Data

Good For

  • Developing and deploying automated shopping agents.
  • Research into trajectory primitive distillation and reward-weighted fine-tuning for agentic tasks.
  • Applications requiring a specialized language model for e-commerce interactions and decision-making.