codelion/Llama-3.3-70B-o1
Hugging Face
TEXT GENERATIONConcurrency Cost:4Model Size:70BQuant:FP8Ctx Length:32kLicense:apache-2.0Architecture:Transformer0.0K Open Weights Warm

codelion/Llama-3.3-70B-o1 is a 70 billion parameter Llama-3.3 model fine-tuned by codelion for enhanced reasoning capabilities. This model specializes in generating Chain-of-Thought (CoT) style reasoning traces, outputting a 'thinking' process before the final solution. It is optimized for tasks requiring explicit step-by-step problem-solving, making it suitable for complex analytical queries. The model has a 32768 token context length and was trained using QLoRA fine-tuning.

Loading preview...

Llama-3.3-70B-o1 Thinker Model Overview

codelion/Llama-3.3-70B-o1 is a 70 billion parameter language model developed by codelion, fine-tuned from unsloth/llama-3.3-70b-instruct-bnb-4bit. Its primary distinction lies in its specialization for Chain-of-Thought (CoT) style reasoning, designed to explicitly show its thought process.

Key Capabilities & Features

  • CoT Reasoning: Generates detailed 'thinking' traces enclosed within <|begin_of_thought|> and <|end_of_thought|> tags, followed by the final answer in <|begin_of_solution|> and <|end_of_solution|> tags.
  • Enhanced Problem Solving: This explicit reasoning process makes it particularly effective for tasks requiring step-by-step analysis and complex problem-solving.
  • Performance: Achieves a score of 46.7 on the AIME 2024 pass@1 benchmark, outperforming the base Llama-3.3-70B model (30.0) and Sky-T1-32B-Preview (43.3).
  • Training Efficiency: Fine-tuned using QLoRA with Unsloth and Huggingface's TRL library, enabling 2x faster training.
  • Context Length: Supports a substantial context length of 32768 tokens, though users should be prepared for potentially large token generation due to the detailed thought process.

When to Use This Model

  • Complex Analytical Tasks: Ideal for applications where understanding the reasoning steps is as crucial as the final answer.
  • Debugging & Transparency: Useful for scenarios where model transparency and explainability are important, allowing developers to trace how the model arrived at a solution.
  • Benchmarking: When evaluating, ensure max_tokens is set sufficiently high (e.g., 8192) to capture the full output, including the thought trace and solution.
Popular Sampler Settings

Top 3 parameter combinations used by Featherless users for this model. Click a tab to see each config.

temperature
top_p
top_k
frequency_penalty
presence_penalty
repetition_penalty
min_p