OctoThinker/OctoThinker-3B-Long-Base
OctoThinker/OctoThinker-3B-Long-Base is a 3.2 billion parameter base language model developed by Zengzhi Wang, Fan Zhou, Xuefeng Li, and Pengfei Liu. Built upon the Llama-3 family, this model incorporates mid-training insights to create a base model specifically optimized for reinforcement learning applications. It features a 32768 token context length and is designed to be reinforcement learning-friendly.
Loading preview...
OctoThinker-3B-Long-Base Overview
OctoThinker-3B-Long-Base is a 3.2 billion parameter language model derived from the Llama-3 family, developed by Zengzhi Wang, Fan Zhou, Xuefeng Li, and Pengfei Liu. Its core differentiator lies in its training methodology, which leverages mid-training insights to specifically enhance its suitability for reinforcement learning (RL) applications. The model is designed to be a robust base for further RL fine-tuning.
Key Characteristics
- RL-Friendly Architecture: Built with a focus on creating a strong foundation for reinforcement learning tasks.
- Llama-3 Family Base: Inherits architectural strengths from the Llama-3 series.
- 32K Context Length: Supports a substantial context window of 32,768 tokens, enabling processing of longer sequences.
- Mid-training Incentivization: Incorporates unique training strategies detailed in its accompanying paper to optimize for RL scaling.
Evaluation and Use Cases
Evaluations for this base model are conducted using few-shot prompting. While specific benchmark numbers are presented visually in the README, the primary intent of OctoThinker-3B-Long-Base is to serve as a strong, RL-optimized foundation for developers and researchers working on reinforcement learning-based language model applications.