Lyte/Llama-3.1-8B-Instruct-Reasoner-1o1_v0.3

TEXT GENERATIONConcurrency Cost:1Model Size:8BQuant:FP8Ctx Length:32kLicense:apache-2.0Architecture:Transformer0.0K Open Weights Cold

Lyte/Llama-3.1-8B-Instruct-Reasoner-1o1_v0.3 is an 8 billion parameter instruction-tuned causal language model developed by Lyte, based on the Llama-3.1-8B-Instruct architecture. This experimental model is specifically fine-tuned to enhance reasoning capabilities by encouraging more token generation for internal thought processes before providing an answer, including self-correction mechanisms. It features a 32,768 token context length and aims to explore reasoning improvements rather than solely optimizing for benchmark performance.

Loading preview...

Overview

Lyte/Llama-3.1-8B-Instruct-Reasoner-1o1_v0.3 is an experimental 8 billion parameter instruction-tuned model developed by Lyte, built upon the Llama-3.1-8B-Instruct architecture. Its primary goal is to explore and improve the model's reasoning process by generating more internal tokens for reflection and self-correction before producing a final response. This approach prioritizes the development of reasoning over achieving state-of-the-art benchmark scores, acknowledging that current benchmarks may not fully capture reasoning abilities.

Key Characteristics

  • Enhanced Reasoning Focus: Designed to generate additional tokens for internal reasoning, verification, and self-correction.
  • Base Model: Fine-tuned from unsloth/meta-llama-3.1-8b-instruct-bnb-4bit.
  • Context Length: Supports a substantial context window of 32,768 tokens.
  • Performance Impact: While experimental, benchmarks show improvements in arc_challenge (+7.60% acc), arc_easy (+8.88% acc), and commonsense_qa (+3.27% acc) when using the finetuning system prompt. However, some scores like MMLU and GSM-8K show a decrease compared to the original Llama-3.1-8B-Instruct, indicating the experimental nature and different optimization target.
  • Training Efficiency: Trained 2x faster using Unsloth and Huggingface's TRL library.

When to Use This Model

This model is particularly suited for:

  • Research into Reasoning: Ideal for developers and researchers interested in exploring and improving AI's reasoning capabilities and self-correction mechanisms.
  • Applications Requiring Deliberation: Use cases where the model's ability to "think through" a problem and potentially correct itself is more valuable than raw speed or benchmark-optimized performance.
  • System Prompt Integration: Best utilized with its specific system prompt to leverage its reasoning-focused finetuning.