Jianwen/Webshop-7B-SFT

Hugging Face
TEXT GENERATIONConcurrency Cost:1Model Size:7.6BQuant:FP8Ctx Length:32kPublished:Jan 25, 2026License:mitArchitecture:Transformer Open Weights Warm

Jianwen/Webshop-7B-SFT is a 7.6 billion parameter model developed by Jianwen, specifically designed as a cold-start checkpoint for the webshop RL environment. It utilizes Experience-based Skill Distillation and a Hierarchical SKILLBANK to organize knowledge, focusing on strategic patterns and lessons from failures. This model excels in context efficiency, achieving 10-20% token compression while enhancing reasoning utility for webshop tasks.

Loading preview...

Overview

Jianwen/Webshop-7B-SFT is a 7.6 billion parameter model, serving as a cold-start checkpoint for the webshop Reinforcement Learning (RL) environment. Developed by Jianwen, this model is specifically fine-tuned (SFT) for the WebShop task, aiming to improve agent performance through structured skill acquisition.

Key Capabilities

  • Experience-based Skill Distillation: The model learns by transforming successful trajectories into strategic patterns and failed attempts into concise lessons, enhancing decision-making.
  • Hierarchical SKILLBANK: It organizes knowledge into two tiers: General Skills for broad strategic guidance and Task-Specific Skills for category-level heuristics, providing a structured approach to problem-solving.
  • Recursive Skill Evolution: Features a dynamic mechanism where the skill library continuously co-evolves with the agent's policy during RL, adapting and improving based on validation failures.
  • Context Efficiency: Achieves significant token compression (10-20%) compared to raw trajectory storage, which not only saves computational resources but also enhances the utility of reasoning within the given context.

Good For

  • Webshop RL Environments: Ideal for researchers and developers working on agents that navigate and interact with web-based shopping platforms.
  • Skill-Augmented Reinforcement Learning: Provides a strong foundation for developing agents that learn and evolve skills recursively.
  • Efficient Trajectory Learning: Useful for scenarios where efficient storage and utilization of past experiences are crucial for agent performance.