Ouro-1.4B: A Parameter-Efficient Looped Language Model

Ouro-1.4B is a 1.4 billion parameter Looped Language Model (LoopLM) developed by ByteDance, distinguished by its exceptional parameter efficiency. This model is engineered to achieve performance levels typically seen in 3-4 billion parameter standard transformers, but with a significantly smaller footprint.

Key Capabilities & Features

Iterative Latent Reasoning: Performs reasoning through recurrent computation within its latent space, allowing for deeper processing with shared weights.
Adaptive Computation: Supports configurable recurrent steps (total_ut_steps) and an early_exit_threshold to dynamically allocate compute resources, balancing performance and speed.
Decoder-only Transformer Architecture: Based on a standard Transformer architecture but with parameter sharing across recurrent steps.
Extensive Training: Trained on 7.7 trillion tokens, including web data, code, mathematics, and long-context documents, with a context length extendable to 64K.

What Makes Ouro-1.4B Different?

Its core differentiator is the Looped Language Model (LoopLM) architecture, which enables it to reuse parameters iteratively. This design choice leads to superior efficiency, allowing it to punch above its weight class in terms of performance relative to its parameter count. The ability to configure recurrent steps and adaptive exit provides fine-grained control over its computational behavior, a unique feature for optimizing inference.

Should You Use This Model?

Ouro-1.4B is primarily intended for research purposes, particularly for those exploring parameter-efficient LLMs, recurrent computation, and adaptive inference strategies. Developers interested in achieving strong performance with a smaller model size, or those experimenting with dynamic compute allocation, will find this model particularly relevant. It's a strong candidate for scenarios where computational resources are constrained, but performance similar to larger models is desired.

Overview

Ouro-1.4B: A Parameter-Efficient Looped Language Model

Key Capabilities & Features

What Makes Ouro-1.4B Different?

Should You Use This Model?

Full Model Card (README)