leafspark/Llama-3.1-8B-MultiReflection-Instruct

Warm
Public
8B
FP8
32768
License: llama3.1
Hugging Face
Overview

Model Overview

leafspark/Llama-3.1-8B-MultiReflection-Instruct is an 8 billion parameter model built on the Llama-3.1 architecture, developed by leafspark. This model is specifically designed for advanced agentic reasoning, drawing inspiration from OpenAI's o1 reasoning model. It was fine-tuned using a synthetically generated dataset from Claude 3.5 Sonnet.

Key Capabilities & Features

  • Multi-step Reasoning: Generates detailed, verbose reasoning processes, including thinking, drafting, and reflection steps, formatted in XML.
  • Agentic Assistant: Designed to function as an advanced agentic assistant, providing general insights and task completion.
  • XML Output Format: Structures responses with <thinking>, <reflection>, <draft>, and <output> XML tags for clear process visualization.
  • Optimized for Long Context: Recommended for use with at least 16k context, supporting a maximum sequence length of 32768 tokens, as responses are typically 2000-3000 tokens long.
  • Coherent Responses: Aims for high coherency, especially when using the recommended sampling parameters.

Training Details

The model was trained on Google Colab's free T4 GPUs using unsloth, completing in approximately 52 minutes. The training involved 3 epochs over 30 steps, with a batch size of 2 and gradient accumulation steps of 4. The dataset consisted of 81 examples, each around 3000 tokens.

Recommended Usage

To leverage its full reasoning capabilities, users should employ a specific system prompt that guides the model to produce XML-formatted thought processes. Recommended sampling parameters include a temperature of 0.15, min-p of 0.2, top-k of 50, and frequency penalty of 0.5.