KristianS7/Ouro-1.4B
KristianS7/Ouro-1.4B is a 1.4 billion parameter Looped Language Model (LoopLM) based on the Transformer architecture, developed by ByteDance. It achieves exceptional parameter efficiency, matching the performance of 3-4B parameter standard transformers through iterative shared-weight computation and recurrent latent reasoning. This model is designed for research purposes, focusing on adaptive computation and dynamic compute allocation via early exit mechanisms, with a context length extendable to 64K.
Loading preview...
Ouro-1.4B: A Parameter-Efficient Looped Language Model
KristianS7/Ouro-1.4B is a 1.4 billion parameter Looped Language Model (LoopLM) developed by ByteDance, designed for research purposes. It distinguishes itself through exceptional parameter efficiency, capable of matching the performance of larger 3-4 billion parameter standard transformers by employing iterative shared-weight computation.
Key Capabilities & Features
- Iterative Latent Reasoning: Performs reasoning through recurrent computation within its latent space.
- Adaptive Computation: Supports early exit mechanisms, allowing for dynamic allocation of computational resources based on the task.
- Configurable Recurrent Steps: Users can adjust
total_ut_stepsto balance performance and computation time, andearly_exit_thresholdfor adaptive exit behavior. - Robust Architecture: Based on a decoder-only Transformer with 24 layers, 2048 hidden size, Multi-Head Attention, SwiGLU FFN, RoPE, and Sandwich RMSNorm.
- Extensive Training: Trained on 7.7 trillion tokens, including web data, code, mathematics, and long-context documents, with a context length extendable to 64K.
When to Use This Model
- Research on Parameter Efficiency: Ideal for exploring methods to achieve high performance with fewer parameters.
- Adaptive Computation Studies: Suitable for investigating dynamic compute allocation and early exit strategies in LLMs.
- Resource-Constrained Environments: Potentially useful for applications where computational resources are limited, given its efficiency.
Note: This model is intended for research and is provided as-is. The adaptive exit feature is not currently supported by vLLM.