featherless-ai/QRWKV-72B

Warm
Public
72B
FP8
65536
License: tongyi-qianwen
Hugging Face
Overview

QRWKV-72B: Efficient Large Language Model with Linear Attention

QRWKV-72B is a 72 billion parameter model developed by featherless-ai, distinguished by its use of the RWKV (Recurrent neural network with a Key-Value attention mechanism) architecture. This model represents a successful conversion of the Qwen 2.5 72B model into an RWKV variant, a process detailed in the paper RADLADS: Rapid Attention Distillation to Linear Attention Decoders at Scale. The primary innovation lies in its linear attention mechanism, which enables a >1000x improvement in inference costs and significantly larger context lengths (up to 65536 tokens) compared to traditional transformer models.

Key Capabilities & Performance

  • Cost-Efficient Inference: The linear attention design drastically reduces computational costs, making large context processing more accessible.
  • Strong General Performance: Benchmarks show QRWKV-72B performing competitively with, and often surpassing, Qwen2.5-72B-Instruct on tasks like ARC Challenge, ARC Easy, Lambada, PIQA, and Winogrande.
  • Inherited Knowledge: The model retains the inherent knowledge and dataset training characteristics of its Qwen 2.5 parent model.
  • Multilingual Support: Supports approximately 30 languages, consistent with the Qwen model line.

Good For

  • Applications requiring long context processing where computational efficiency is critical.
  • General language understanding and generation tasks.
  • Developers looking for a powerful 72B parameter model with optimized inference costs.