featherless-ai/QRWKV-72B
TEXT GENERATIONConcurrency Cost:1Model Size:72BQuant:FP8Ctx Length:64kPublished:Mar 16, 2025License:tongyi-qianwenArchitecture:Transformer0.1K Cold

featherless-ai/QRWKV-72B is a 72 billion parameter language model developed by featherless-ai, based on the RWKV architecture with a 65536 token context length. This model is a conversion of Qwen 2.5 72B into an RWKV variant, leveraging linear attention for significantly reduced computational costs at scale. It excels in general language understanding and generation tasks, demonstrating competitive performance against its Qwen2.5 counterpart across various benchmarks.

Loading preview...

QRWKV-72B: Efficient Large Language Model with Linear Attention

QRWKV-72B is a 72 billion parameter model developed by featherless-ai, distinguished by its use of the RWKV (Recurrent neural network with a Key-Value attention mechanism) architecture. This model represents a successful conversion of the Qwen 2.5 72B model into an RWKV variant, a process detailed in the paper RADLADS: Rapid Attention Distillation to Linear Attention Decoders at Scale. The primary innovation lies in its linear attention mechanism, which enables a >1000x improvement in inference costs and significantly larger context lengths (up to 65536 tokens) compared to traditional transformer models.

Key Capabilities & Performance

  • Cost-Efficient Inference: The linear attention design drastically reduces computational costs, making large context processing more accessible.
  • Strong General Performance: Benchmarks show QRWKV-72B performing competitively with, and often surpassing, Qwen2.5-72B-Instruct on tasks like ARC Challenge, ARC Easy, Lambada, PIQA, and Winogrande.
  • Inherited Knowledge: The model retains the inherent knowledge and dataset training characteristics of its Qwen 2.5 parent model.
  • Multilingual Support: Supports approximately 30 languages, consistent with the Qwen model line.

Good For

  • Applications requiring long context processing where computational efficiency is critical.
  • General language understanding and generation tasks.
  • Developers looking for a powerful 72B parameter model with optimized inference costs.
Popular Sampler Settings

Top 3 parameter combinations used by Featherless users for this model. Click a tab to see each config.

temperature
top_p
top_k
frequency_penalty
presence_penalty
repetition_penalty
min_p