Name: recursal/QRWKV6-32B-Instruct-Preview-v0.1 API
Brand: Featherless.ai
Price: 25.00 USD
Availability: InStock
Rating: 4.0 (1 reviews)
Author: recursal

QRWKV6-32B-Instruct-Preview-v0.1: A Large-Scale RWKV Model

recursal/QRWKV6-32B-Instruct-Preview-v0.1 is a 32 billion parameter instruction-tuned model developed by Recursal, showcasing a significant advancement in the RWKV (Receptance Weighted Key Value) architecture. This model is notable for its unique approach: it's derived from a QKV Attention-based model (specifically Qwen2.5-32B-Instruct) through a conversion process, rather than being trained from scratch. This method allows for rapid validation of the efficient RWKV linear attention mechanism at a larger scale.

Key Capabilities and Differentiators

Computational Efficiency: RWKV linear models are designed to drastically reduce computational costs, particularly for long context lengths, offering over a 1000x improvement in inference cost efficiency compared to traditional transformer architectures.
Performance: Evaluation benchmarks indicate that QRWKV6-32B-Instruct performs comparably to or even surpasses its base model, Qwen2.5-32B-Instruct, across several metrics like arc_challenge, piqa, and sciq, while being slightly behind on MMLU and hellaSwag.
Architectural Innovation: It demonstrates the scalability and architectural design of RWKV, proving that QKV attention is not the sole essential component for high-performing LLMs.
Inherited Knowledge: The model inherits its inherent knowledge and dataset training from its "parent" Qwen model, supporting approximately 30 languages.

Limitations and Future Directions

Context Length: Due to compute constraints, the current model was trained up to a 16K token context length, though it is stable beyond this limit.
Inference Code: This specific model requires separate inference code due to the lack of RWKV-based channel mix and feedforward layers.

Recursal plans to release Q-RWKV-7 32B and LLaMA-RWKV-7 70B, along with detailed conversion methodologies and a paper, following the finalization of RWKV-7.