Overview
Jianwen/Search-7B-SFT is a 7.6 billion parameter model designed as a cold-start checkpoint for reinforcement learning agents operating in search environments. It is specifically fine-tuned for search tasks (SFT stage) and incorporates several innovative features to enhance learning and efficiency.
Key Capabilities
- Experience-based Skill Distillation: The model processes successful trajectories to extract strategic patterns and analyzes failures to derive concise lessons.
- Hierarchical SKILLBANK: It organizes learned knowledge into General Skills for broad strategic guidance and Task-Specific Skills for category-level heuristics, providing a structured approach to skill management.
- Recursive Skill Evolution: A dynamic mechanism allows the skill library to co-evolve with the agent's policy during reinforcement learning, continuously improving by analyzing validation failures.
- Context Efficiency: Achieves significant token compression (10-20%) compared to raw trajectory storage, which not only saves computational resources but also improves the utility of the agent's reasoning process.
Good For
- Developing RL agents for search tasks: Provides a strong foundation for agents that need to navigate and solve problems within search environments.
- Research in Skill-Augmented Reinforcement Learning: Ideal for exploring and implementing advanced techniques like experience-based skill distillation and recursive skill evolution.
- Optimizing context usage in RL: Useful for scenarios where efficient token management and enhanced reasoning from compressed trajectories are critical.
For further details and to run training scripts, refer to the SkillRL GitHub repository. More information on the underlying research can be found in the model paper.