Name: OctoThinker/OctoThinker-3B-Long-Base API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: OctoThinker

OctoThinker-3B-Long-Base Overview

OctoThinker-3B-Long-Base is a 3.2 billion parameter language model derived from the Llama-3 family, developed by Zengzhi Wang, Fan Zhou, Xuefeng Li, and Pengfei Liu. Its core differentiator lies in its training methodology, which leverages mid-training insights to specifically enhance its suitability for reinforcement learning (RL) applications. The model is designed to be a robust base for further RL fine-tuning.

Key Characteristics

RL-Friendly Architecture: Built with a focus on creating a strong foundation for reinforcement learning tasks.
Llama-3 Family Base: Inherits architectural strengths from the Llama-3 series.
32K Context Length: Supports a substantial context window of 32,768 tokens, enabling processing of longer sequences.
Mid-training Incentivization: Incorporates unique training strategies detailed in its accompanying paper to optimize for RL scaling.

Evaluation and Use Cases

Evaluations for this base model are conducted using few-shot prompting. While specific benchmark numbers are presented visually in the README, the primary intent of OctoThinker-3B-Long-Base is to serve as a strong, RL-optimized foundation for developers and researchers working on reinforcement learning-based language model applications.

Overview

OctoThinker-3B-Long-Base Overview

Key Characteristics

Evaluation and Use Cases

Full Model Card (README)