KristianS7/Ouro-1.4B

TEXT GENERATIONConcurrency Cost:1Model Size:1.4BQuant:BF16Ctx Length:32kPublished:Mar 10, 2026License:apache-2.0Architecture:Transformer0.0K Open Weights Cold

KristianS7/Ouro-1.4B is a 1.4 billion parameter Looped Language Model (LoopLM) based on the Transformer architecture, developed by ByteDance. It achieves exceptional parameter efficiency, matching the performance of 3-4B parameter standard transformers through iterative shared-weight computation and recurrent latent reasoning. This model is designed for research purposes, focusing on adaptive computation and dynamic compute allocation via early exit mechanisms, with a context length extendable to 64K.

Loading preview...

Ouro-1.4B: A Parameter-Efficient Looped Language Model

KristianS7/Ouro-1.4B is a 1.4 billion parameter Looped Language Model (LoopLM) developed by ByteDance, designed for research purposes. It distinguishes itself through exceptional parameter efficiency, capable of matching the performance of larger 3-4 billion parameter standard transformers by employing iterative shared-weight computation.

Key Capabilities & Features

  • Iterative Latent Reasoning: Performs reasoning through recurrent computation within its latent space.
  • Adaptive Computation: Supports early exit mechanisms, allowing for dynamic allocation of computational resources based on the task.
  • Configurable Recurrent Steps: Users can adjust total_ut_steps to balance performance and computation time, and early_exit_threshold for adaptive exit behavior.
  • Robust Architecture: Based on a decoder-only Transformer with 24 layers, 2048 hidden size, Multi-Head Attention, SwiGLU FFN, RoPE, and Sandwich RMSNorm.
  • Extensive Training: Trained on 7.7 trillion tokens, including web data, code, mathematics, and long-context documents, with a context length extendable to 64K.

When to Use This Model

  • Research on Parameter Efficiency: Ideal for exploring methods to achieve high performance with fewer parameters.
  • Adaptive Computation Studies: Suitable for investigating dynamic compute allocation and early exit strategies in LLMs.
  • Resource-Constrained Environments: Potentially useful for applications where computational resources are limited, given its efficiency.

Note: This model is intended for research and is provided as-is. The adaptive exit feature is not currently supported by vLLM.