ByteDance/Ouro-1.4B

TEXT GENERATIONConcurrency Cost:1Model Size:1.4BQuant:BF16Ctx Length:32kPublished:Oct 28, 2025License:apache-2.0Architecture:Transformer0.1K Open Weights Cold

ByteDance/Ouro-1.4B is a 1.4 billion parameter Looped Language Model (LoopLM) designed for exceptional parameter efficiency through iterative shared-weight computation. This model matches the performance of larger 3-4B standard transformers by performing reasoning through recurrent computation in latent space. It supports adaptive computation with early exit mechanisms for dynamic compute allocation, making it suitable for research into efficient language processing.

Loading preview...

Ouro-1.4B: An Efficient Looped Language Model

Ouro-1.4B, developed by ByteDance, is a 1.4 billion parameter Looped Language Model (LoopLM) that achieves significant parameter efficiency. It is designed to match the performance of larger 3-4 billion parameter standard transformers by employing iterative shared-weight computation and recurrent processing in its latent space.

Key Capabilities

  • Exceptional Parameter Efficiency: Delivers performance comparable to 3-4B parameter models with only 1.4B parameters.
  • Iterative Latent Reasoning: Utilizes recurrent computation for reasoning tasks, enhancing its analytical capabilities.
  • Adaptive Computation: Features configurable recurrent steps (total_ut_steps) and an adaptive early exit mechanism (early_exit_threshold) to dynamically manage computational resources based on task complexity. Note that vLLM currently bypasses the adaptive exit feature, always executing full recurrent steps.

Model Architecture & Training

Ouro-1.4B is a decoder-only Transformer with 24 layers, 4 recurrent steps, and a 2048 hidden size. It uses Multi-Head Attention, SwiGLU activation, RoPE for position embeddings, and Sandwich RMSNorm. The model was trained on 7.7 trillion tokens across multiple stages, including pre-training, CT Annealing, long context training, and mid-training, using a diverse dataset comprising web data, code, mathematics, and long-context documents.

Good for

  • Research into parameter-efficient language models and recurrent computation.
  • Applications where computational budget is a constraint but performance comparable to larger models is desired.
  • Experimenting with adaptive computation and early exit strategies in LLMs.