recursal/RWKV6QwQ-32B-final-250307

Cold
Public
32B
FP8
32768
License: apache-2.0
Hugging Face
Overview

Model Overview

The recursal/RWKV6QwQ-32B-final-250307 is a 32 billion parameter language model developed by recursal, utilizing a RWKV-variant architecture. This model is derived from the Qwen 2.5 QwQ 32B, showcasing a successful conversion to a more efficient linear attention mechanism without requiring a full pre-training or retraining from scratch. This approach aims to drastically reduce computational costs and enable faster inference, especially for applications requiring large context lengths.

Key Capabilities & Performance

This model inherits its core knowledge and dataset training from its Qwen parent, supporting approximately 30 languages. It demonstrates strong performance across several benchmarks, often outperforming its base Qwen counterpart in specific tasks:

  • ARC Challenge (acc_norm): Achieves 0.5640, slightly surpassing Qwen/QwQ-32B's 0.5563.
  • Winogrande (acc): Scores 0.7324, outperforming Qwen/QwQ-32B's 0.7048.
  • SCIQ (acc): Matches Qwen/QwQ-32B with 0.9630.

Unique Differentiator

The primary innovation of this model lies in its linear attention mechanism, which offers a >1000x improvement in inference costs compared to traditional transformer architectures. This makes it highly efficient for scenarios demanding cost-effective and scalable AI solutions, particularly for long-context processing. The model's development process, detailed in the paper RADLADS: Rapid Attention Distillation to Linear Attention Decoders at Scale, highlights a method for converting existing large models to more efficient RWKV variants.