OctoThinker/OctoThinker-8B-Long-Base

TEXT GENERATIONConcurrency Cost:1Model Size:8BQuant:FP8Ctx Length:32kTool Calling:SupportedPublished:Apr 24, 2025License:llama3.2Architecture:Transformer Cold

OctoThinker/OctoThinker-8B-Long-Base is an 8 billion parameter base language model developed by Wang, Zhou, Li, and Liu, built upon the Llama-3 family architecture. It features a 32,768 token context length and is specifically designed with mid-training insights to be reinforcement learning-friendly. This model is optimized for applications requiring a robust base for further RL-based fine-tuning and demonstrates competitive performance in few-shot prompting evaluations.

Loading preview...

OctoThinker-8B-Long-Base Overview

OctoThinker-8B-Long-Base is an 8 billion parameter base language model, part of the OctoThinker family, developed by Wang, Zhou, Li, and Liu. This model is distinguished by its foundation on the Llama-3 architecture and its unique training approach that incorporates mid-training insights to enhance its compatibility with reinforcement learning (RL) methodologies. It supports a substantial 32,768 token context length, making it suitable for processing longer sequences of text.

Key Capabilities & Features

  • Reinforcement Learning Friendly: Specifically designed with a training recipe that incentivizes reinforcement learning scaling, making it an ideal base for RL-based fine-tuning.
  • Llama-3 Family Architecture: Leverages the robust and well-understood architecture of the Llama-3 family.
  • Extended Context Window: Offers a 32,768 token context length, enabling the model to handle complex and lengthy inputs.
  • Few-Shot Evaluation Performance: Demonstrates competitive performance in few-shot prompting evaluations, indicating strong generalization capabilities.

Good For

  • Developers and researchers looking for a strong base model to fine-tune using reinforcement learning techniques.
  • Applications requiring a large context window for processing extensive documents or conversations.
  • Experiments and research into mid-training optimization strategies for language models.

For more in-depth details on the training methodology and insights, refer to the associated research paper.