Name: featherless-ai/QRWKV-72B API
Brand: Featherless.ai
Price: 25.00 USD
Availability: InStock
Author: featherless-ai

QRWKV-72B: Efficient Large Language Model with Linear Attention

QRWKV-72B is a 72 billion parameter model developed by featherless-ai, distinguished by its use of the RWKV (Recurrent neural network with a Key-Value attention mechanism) architecture. This model represents a successful conversion of the Qwen 2.5 72B model into an RWKV variant, a process detailed in the paper RADLADS: Rapid Attention Distillation to Linear Attention Decoders at Scale. The primary innovation lies in its linear attention mechanism, which enables a >1000x improvement in inference costs and significantly larger context lengths (up to 65536 tokens) compared to traditional transformer models.

Key Capabilities & Performance

Cost-Efficient Inference: The linear attention design drastically reduces computational costs, making large context processing more accessible.
Strong General Performance: Benchmarks show QRWKV-72B performing competitively with, and often surpassing, Qwen2.5-72B-Instruct on tasks like ARC Challenge, ARC Easy, Lambada, PIQA, and Winogrande.
Inherited Knowledge: The model retains the inherent knowledge and dataset training characteristics of its Qwen 2.5 parent model.
Multilingual Support: Supports approximately 30 languages, consistent with the Qwen model line.

Good For

Applications requiring long context processing where computational efficiency is critical.
General language understanding and generation tasks.
Developers looking for a powerful 72B parameter model with optimized inference costs.

Overview

QRWKV-72B: Efficient Large Language Model with Linear Attention

Key Capabilities & Performance

Good For

Full Model Card (README)