featherless-ai/QRWKV-QwQ-32B
QRWKV-QwQ-32B is a 32 billion parameter RWKV-variant language model developed by featherless-ai, based on the Qwen 2.5 QwQ 32B architecture. It features a 32768-token context length and utilizes linear attention to significantly reduce computational costs for large contexts. This model is optimized for efficient inference and broad accessibility, inheriting its knowledge and multilingual capabilities (approximately 30 languages) from its Qwen parent model.
Loading preview...
QRWKV-QwQ-32B: Efficient Linear Attention Language Model
QRWKV-QwQ-32B is a 32 billion parameter language model developed by featherless-ai, built upon the Qwen 2.5 QwQ 32B architecture. This model leverages a RWKV variant with linear attention, a technique designed to drastically reduce computational costs and improve inference efficiency, especially for extended context lengths up to 32768 tokens. The development process involved converting the Qwen 2.5 QwQ 32B into an RWKV variant without requiring a full pre-training or retraining from scratch, demonstrating an efficient method for integrating linear attention.
Key Capabilities & Performance
This model inherits its core knowledge and dataset training from its Qwen parent, supporting approximately 30 languages. Benchmarks indicate competitive performance against its base model, Qwen/QwQ-32B, and other larger models like Qwen2.5-72B-Instruct across various tasks:
- arc_challenge: Achieves 0.5640 acc_norm, outperforming Qwen/QwQ-32B.
- winogrande: Scores 0.7324 acc, surpassing Qwen/QwQ-32B.
- sciq: Matches Qwen/QwQ-32B with 0.9630 acc.
Unique Approach
The model's core innovation lies in its use of linear attention, which enables over a 1000x improvement in inference costs, facilitating more accessible and efficient AI. This approach allows for testing and validating efficient RWKV linear attention with a smaller budget, making advanced language models more practical for a wider range of applications. Further details on the underlying research can be found in the paper RADLADS: Rapid Attention Distillation to Linear Attention Decoders at Scale.
Top 3 parameter combinations used by Featherless users for this model. Click a tab to see each config.