Name: THU-KEG/DeepDive-4B-SFT API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: THU-KEG

THU-KEG/DeepDive-4B-SFT Overview

THU-KEG/DeepDive-4B-SFT is a 4 billion parameter instruction-tuned model developed by THU-KEG, primarily designed to support advanced deep search agents. This model is a key component of the research presented in the paper "Chaining the Evidence: Robust Reinforcement Learning for Deep Search Agents with Citation-Aware Rubric Rewards." Its core innovation lies in its fine-tuning for robust reinforcement learning, specifically by leveraging citation-aware rubric rewards to improve agent performance and reliability.

Key Capabilities

Enhanced Deep Search: Optimized for tasks requiring agents to perform in-depth information retrieval and evidence chaining.
Citation-Aware Rewards: Integrates a novel reward mechanism that considers citation quality and relevance, leading to more robust learning.
Reinforcement Learning Integration: Designed to be a foundational component for developing sophisticated RL-based search agents.
Large Context Window: Features a 32768-token context length, enabling the processing of extensive search results and complex queries.

Good For

Researchers and developers working on advanced search agents and information retrieval systems.
Applications requiring robust evidence-based reasoning and citation analysis.
Experiments in reinforcement learning for complex, knowledge-intensive tasks.
Projects that benefit from a model specifically trained to understand and utilize contextual evidence from diverse sources.

Overview

THU-KEG/DeepDive-4B-SFT Overview

Key Capabilities

Good For

Full Model Card (README)