Name: ByteDance/Ouro-2.6B API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: ByteDance

Overview

ByteDance/Ouro-2.6B is a 2.6 billion parameter Looped Language Model (LoopLM) that introduces a novel approach to parameter efficiency. It leverages iterative shared-weight computation to achieve performance on par with larger 3-4 billion parameter standard transformers, making it a highly efficient model for its size. The model was trained on a substantial 7.7 trillion tokens, encompassing web data, code, mathematics, and long-context documents.

Key Capabilities

Exceptional Parameter Efficiency: Matches the performance of larger models (3-4B parameters) with only 2.6 billion parameters through its unique LoopLM architecture.
Iterative Latent Reasoning: Performs reasoning tasks by recurrently processing information within its latent space.
Adaptive Computation: Features an adaptive exit mechanism, allowing for dynamic allocation of computational resources based on the task's complexity. This can be configured via early_exit_threshold in config.json.
Configurable Recurrent Steps: Users can adjust the total_ut_steps parameter to control the number of recurrent computations, balancing performance and inference speed.

Architecture and Training

Ouro-2.6B is based on a decoder-only Transformer architecture with 24 layers, a hidden size of 2048, and a vocabulary of 49,152. It uses RoPE for position embeddings and Sandwich RMSNorm. The model was trained through a multi-stage pipeline, including pre-training, CT annealing, long-context training, and mid-training phases. While its training context length was 4K, it is extendable to 64K.

Intended Use

This model is primarily intended for research purposes to explore and develop efficient language model architectures. Developers interested in parameter-efficient models, iterative reasoning, or adaptive computation will find Ouro-2.6B particularly relevant. Note that the adaptive exit feature is not currently supported in vLLM, where the model will always execute all recurrent steps.

Overview

Overview

Key Capabilities

Architecture and Training

Intended Use

Full Model Card (README)