oro-ai/qwen3-4b-shoppingbench-rejection
The oro-ai/qwen3-4b-shoppingbench-rejection model is a 4 billion parameter Qwen3-based language model, fine-tuned using reward-weighted rejection sampling. Developed by ORO-AI, it is the second stage in the ShoppingBench distillation pipeline, specifically optimized for agentic shopping tasks. This model significantly improves the Agent Success Rate (ASR) on ShoppingBench, making it suitable for applications requiring robust automated shopping agent performance.
Loading preview...
Overview
This model, oro-ai/qwen3-4b-shoppingbench-rejection, is a 4 billion parameter variant of the Qwen3 architecture, developed by ORO-AI. It represents the second stage of fine-tuning within the ShoppingBench distillation pipeline, utilizing reward-weighted rejection sampling. The primary goal of this fine-tuning is to enhance the model's performance in agentic shopping scenarios.
Key Capabilities
- Enhanced Agent Success Rate (ASR): The model achieves a 42.7% ASR on ShoppingBench, a substantial improvement over the base Qwen3-4B's 18.0% ASR. This metric is evaluated on a leak-cluster-guarded, held-out partition with production-strict scoring.
- Rejection-Sampled Fine-tuning: It leverages reward-weighted rejection sampling, a technique designed to distill high-performing agent trajectories.
- Ready-to-Use: This is a merged full model, meaning the Qwen3-4B base weights are integrated with the trained delta, allowing direct loading with
transformersor serving withvLLMwithout requiring adapter stacking.
Training Data
- Fine-tuned on a filtered corpus:
oro-ai/sn15-shoppingbench-sft-15k - Utilizes raw traces from:
oro-ai/sn15-shoppingbench-traces-18k
Good For
- Developing and deploying automated shopping agents.
- Research into trajectory primitive distillation and reward-weighted fine-tuning for agentic tasks.
- Applications requiring a specialized language model for e-commerce interactions and decision-making.