THU-KEG/DeepDive-4B-SFT Overview
THU-KEG/DeepDive-4B-SFT is a 4 billion parameter instruction-tuned model developed by THU-KEG, primarily designed to support advanced deep search agents. This model is a key component of the research presented in the paper "Chaining the Evidence: Robust Reinforcement Learning for Deep Search Agents with Citation-Aware Rubric Rewards." Its core innovation lies in its fine-tuning for robust reinforcement learning, specifically by leveraging citation-aware rubric rewards to improve agent performance and reliability.
Key Capabilities
- Enhanced Deep Search: Optimized for tasks requiring agents to perform in-depth information retrieval and evidence chaining.
- Citation-Aware Rewards: Integrates a novel reward mechanism that considers citation quality and relevance, leading to more robust learning.
- Reinforcement Learning Integration: Designed to be a foundational component for developing sophisticated RL-based search agents.
- Large Context Window: Features a 32768-token context length, enabling the processing of extensive search results and complex queries.
Good For
- Researchers and developers working on advanced search agents and information retrieval systems.
- Applications requiring robust evidence-based reasoning and citation analysis.
- Experiments in reinforcement learning for complex, knowledge-intensive tasks.
- Projects that benefit from a model specifically trained to understand and utilize contextual evidence from diverse sources.