Name: ArcadaLabs/Ouro-2.6B-mlx-bf16 API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: ArcadaLabs

ArcadaLabs/Ouro-2.6B-mlx-bf16: A Looped Language Model for MLX

This model is an unquantized (bfloat16) conversion of ByteDance's Ouro-2.6B, specifically engineered for Apple Silicon using the MLX framework. Ouro-2.6B is a unique Looped Language Model (LoopLM) or Universal Transformer architecture that applies its 24 transformer blocks recurrently 4 times, resulting in an effective 96 layers. This design allows it to achieve performance typically associated with much larger models, despite its 2.6 billion parameters.

Key Capabilities & Features

Efficient MLX Conversion: Provides a full-precision bfloat16 version optimized for Apple Silicon, offering significantly faster inference (e.g., 12.0 tok/s vs. 6.9 tok/s for PyTorch fp16 on MPS in informal benchmarks).
Looped Architecture: Employs recurrent processing of transformer blocks, enhancing reasoning capabilities and performance relative to its parameter count.
Extensive Training: Trained on a massive 7.7 trillion tokens, contributing to its robust language understanding and generation.
Adaptable Context: Features a 4K token training context length, extendable to 64K, suitable for handling longer inputs.
Custom MLX Integration: Requires a custom ouro.py model file for mlx-lm to support its unique architecture, including sandwich RMSNorm and KV cache management for recurrent passes.

Ideal Use Cases

Apple Silicon Development: Developers targeting Apple Silicon for local LLM inference will benefit from its MLX optimization.
Resource-Constrained Environments: Its efficient looped architecture makes it a strong candidate for scenarios where performance is needed without the memory footprint of larger, non-looped models.
Experimental MLX Projects: Useful for those exploring advanced model architectures and custom MLX integrations.

Overview

ArcadaLabs/Ouro-2.6B-mlx-bf16: A Looped Language Model for MLX

Key Capabilities & Features

Ideal Use Cases

Full Model Card (README)