Name: moonshotai/Kimi-Linear-48B-A3B-Base API
Brand: Featherless.ai
Price: 25.00 USD
Availability: InStock
Author: moonshotai

Overview

Kimi Linear is a 48 billion parameter model from Moonshot AI, distinguished by its hybrid linear attention architecture. It integrates Kimi Delta Attention (KDA), an optimized version of Gated DeltaNet, which refines the gating mechanism for more efficient use of finite-state RNN memory. This design allows Kimi Linear to surpass traditional full attention methods in various contexts, including short, long, and reinforcement learning scaling regimes.

Key Capabilities

Extended Context Handling: Supports context lengths up to 1 million tokens, making it highly suitable for tasks requiring extensive memory.
Enhanced Efficiency: Achieves up to 6x faster decoding and significantly reduces time per output token (TPOT) compared to full attention models.
Reduced Memory Footprint: Decreases the need for large KV caches by up to 75%.
Superior Performance: Outperforms full attention in various benchmarks, including long-context and RL-style tasks, as demonstrated in 1.4T token training runs.
Hybrid Architecture: Utilizes a 3:1 KDA-to-global MLA ratio to balance memory efficiency with performance quality.

Good For

Applications demanding high throughput and efficient processing of very long sequences.
Scenarios where memory optimization is critical, especially for large language models.
Tasks benefiting from extended context understanding and generation, such as document analysis or complex reasoning over large texts.

Overview

Overview

Key Capabilities

Good For

Full Model Card (README)