OctoThinker/OctoThinker-3B-Long-Base

TEXT GENERATIONConcurrency Cost:1Model Size:3.2BQuant:BF16Ctx Length:32kTool Calling:SupportedPublished:Apr 22, 2025License:llama3.2Architecture:Transformer0.0K Cold

OctoThinker/OctoThinker-3B-Long-Base is a 3.2 billion parameter base language model developed by Zengzhi Wang, Fan Zhou, Xuefeng Li, and Pengfei Liu. Built upon the Llama-3 family, this model incorporates mid-training insights to create a base model specifically optimized for reinforcement learning applications. It features a 32768 token context length and is designed to be reinforcement learning-friendly.

Loading preview...

OctoThinker-3B-Long-Base Overview

OctoThinker-3B-Long-Base is a 3.2 billion parameter language model derived from the Llama-3 family, developed by Zengzhi Wang, Fan Zhou, Xuefeng Li, and Pengfei Liu. Its core differentiator lies in its training methodology, which leverages mid-training insights to specifically enhance its suitability for reinforcement learning (RL) applications. The model is designed to be a robust base for further RL fine-tuning.

Key Characteristics

  • RL-Friendly Architecture: Built with a focus on creating a strong foundation for reinforcement learning tasks.
  • Llama-3 Family Base: Inherits architectural strengths from the Llama-3 series.
  • 32K Context Length: Supports a substantial context window of 32,768 tokens, enabling processing of longer sequences.
  • Mid-training Incentivization: Incorporates unique training strategies detailed in its accompanying paper to optimize for RL scaling.

Evaluation and Use Cases

Evaluations for this base model are conducted using few-shot prompting. While specific benchmark numbers are presented visually in the README, the primary intent of OctoThinker-3B-Long-Base is to serve as a strong, RL-optimized foundation for developers and researchers working on reinforcement learning-based language model applications.