deepcogito/cogito-v2-preview-llama-70B

Hugging Face
TEXT GENERATIONConcurrency Cost:4Model Size:70BQuant:FP8Ctx Length:32kPublished:Jul 27, 2025License:llama3.1Architecture:Transformer0.0K Warm

The deepcogito/cogito-v2-preview-llama-70B is a 70 billion parameter instruction-tuned generative language model developed by DeepCogito. This model is a hybrid reasoning model, capable of both direct answers and self-reflection, trained using Iterated Distillation and Amplification (IDA) for iterative self-improvement. It is optimized for coding, STEM, instruction following, and general helpfulness, offering significantly enhanced multilingual, coding, and tool-calling capabilities compared to similarly sized counterparts. The model supports a context length of 128k tokens and is trained in over 30 languages.

Loading preview...

Cogito v2-preview-llama-70B: Hybrid Reasoning LLM

The deepcogito/cogito-v2-preview-llama-70B is a 70 billion parameter instruction-tuned generative language model from DeepCogito, designed for advanced reasoning and general-purpose applications. This model stands out as a hybrid reasoning model, capable of operating in a standard LLM mode or an enhanced self-reflection mode, where it 'thinks' before generating a response. This capability is enabled by its training using Iterated Distillation and Amplification (IDA), an alignment strategy focused on iterative self-improvement.

Key Capabilities & Optimizations

  • Hybrid Reasoning: Seamlessly switches between direct response and a self-reflective 'thinking' mode for improved accuracy and coherence.
  • Enhanced Performance: Outperforms size-equivalent models on common industry benchmarks in both standard and reasoning modes.
  • Multilingual Support: Trained in over 30 languages, offering strong multilingual capabilities.
  • Specialized Strengths: Optimized for:
    • Coding tasks
    • STEM (Science, Technology, Engineering, Mathematics) problems
    • Complex instruction following
    • General helpfulness
  • Tool Calling: Supports single, parallel, and multiple tool calls in both standard and extended thinking modes, facilitating integration with external functions.
  • Extended Context: Features a substantial context length of 128k tokens.

Usage & Differentiation

Developers can enable the model's extended thinking mode by setting enable_thinking=True in the tokenizer's chat template or by including a specific system prompt and prefilling the response with <think>\n. This unique hybrid approach allows for more robust and reliable outputs, particularly for tasks requiring deeper analysis or problem-solving. The model's strong performance across coding, STEM, and multilingual benchmarks, combined with its advanced reasoning and tool-calling features, positions it as a versatile choice for a wide range of demanding applications.

Popular Sampler Settings

Top 3 parameter combinations used by Featherless users for this model. Click a tab to see each config.

temperature
top_p
top_k
frequency_penalty
presence_penalty
repetition_penalty
min_p