deepcogito/cogito-v2-preview-llama-70B
The deepcogito/cogito-v2-preview-llama-70B is a 70 billion parameter instruction-tuned generative language model developed by DeepCogito. This model is a hybrid reasoning model, capable of both direct answers and self-reflection, trained using Iterated Distillation and Amplification (IDA) for iterative self-improvement. It is optimized for coding, STEM, instruction following, and general helpfulness, offering significantly enhanced multilingual, coding, and tool-calling capabilities compared to similarly sized counterparts. The model supports a context length of 128k tokens and is trained in over 30 languages.
Loading preview...
Cogito v2-preview-llama-70B: Hybrid Reasoning LLM
The deepcogito/cogito-v2-preview-llama-70B is a 70 billion parameter instruction-tuned generative language model from DeepCogito, designed for advanced reasoning and general-purpose applications. This model stands out as a hybrid reasoning model, capable of operating in a standard LLM mode or an enhanced self-reflection mode, where it 'thinks' before generating a response. This capability is enabled by its training using Iterated Distillation and Amplification (IDA), an alignment strategy focused on iterative self-improvement.
Key Capabilities & Optimizations
- Hybrid Reasoning: Seamlessly switches between direct response and a self-reflective 'thinking' mode for improved accuracy and coherence.
- Enhanced Performance: Outperforms size-equivalent models on common industry benchmarks in both standard and reasoning modes.
- Multilingual Support: Trained in over 30 languages, offering strong multilingual capabilities.
- Specialized Strengths: Optimized for:
- Coding tasks
- STEM (Science, Technology, Engineering, Mathematics) problems
- Complex instruction following
- General helpfulness
- Tool Calling: Supports single, parallel, and multiple tool calls in both standard and extended thinking modes, facilitating integration with external functions.
- Extended Context: Features a substantial context length of 128k tokens.
Usage & Differentiation
Developers can enable the model's extended thinking mode by setting enable_thinking=True in the tokenizer's chat template or by including a specific system prompt and prefilling the response with <think>\n. This unique hybrid approach allows for more robust and reliable outputs, particularly for tasks requiring deeper analysis or problem-solving. The model's strong performance across coding, STEM, and multilingual benchmarks, combined with its advanced reasoning and tool-calling features, positions it as a versatile choice for a wide range of demanding applications.
Top 3 parameter combinations used by Featherless users for this model. Click a tab to see each config.