deepcogito/cogito-v1-preview-llama-8B

Hugging Face
TEXT GENERATIONConcurrency Cost:1Model Size:8BQuant:FP8Ctx Length:32kPublished:Mar 31, 2025License:llama3.1Architecture:Transformer0.1K Warm

The deepcogito/cogito-v1-preview-llama-8B is an 8 billion parameter instruction-tuned generative language model developed by deepcogito, built on the Llama architecture. It is a hybrid reasoning model capable of both direct answers and self-reflection, trained using Iterated Distillation and Amplification (IDA). Optimized for coding, STEM, instruction following, and general helpfulness, it offers significantly higher multilingual, coding, and tool-calling capabilities than similarly sized counterparts, supporting a 32768-token context length.

Loading preview...

Cogito v1 Preview - 8B Overview

The deepcogito/cogito-v1-preview-llama-8B is an 8 billion parameter instruction-tuned generative language model. It is designed as a hybrid reasoning model, capable of providing direct answers or engaging in self-reflection before responding, a feature that can be activated via a specific system prompt or tokenizer setting. The model is trained using Iterated Distillation and Amplification (IDA), an iterative self-improvement strategy aimed at scalable and efficient alignment.

Key Capabilities & Optimizations

  • Hybrid Reasoning: Functions in both standard and 'deep thinking' modes, outperforming size-equivalent models in both direct and reasoning benchmarks.
  • Optimized Domains: Specifically optimized for coding, STEM tasks, instruction following, and general helpfulness.
  • Multilingual Support: Trained in over 30 languages, enhancing its global applicability.
  • Advanced Tool Calling: Supports single, parallel, multiple, and parallel-multiple tool calls in both standard and extended thinking modes.
  • Extended Context: Features a context length of 32768 tokens.

Performance Highlights

Evaluations show that Cogito v1-preview models surpass their size-equivalent counterparts (e.g., Llama/Qwen instruct, Deepseek R1, Qwen QwQ) on common industry benchmarks in both direct and reasoning modes. It demonstrates strong performance in tool calling capabilities across various complexities.

When to Use This Model

This model is particularly well-suited for applications requiring robust instruction following, complex reasoning, code generation, and multilingual support. Its hybrid reasoning capability makes it valuable for tasks where enhanced problem-solving and self-correction are beneficial, especially in STEM and development contexts.

Popular Sampler Settings

Top 3 parameter combinations used by Featherless users for this model. Click a tab to see each config.

temperature
top_p
top_k
frequency_penalty
presence_penalty
repetition_penalty
min_p