recursal/QRWKV6-32B-Instruct-Preview-v0.1

4.0 based on 1 review
Cold
Public
32B
FP8
32768
1
Dec 7, 2024
License: apache-2.0
Hugging Face
Overview

QRWKV6-32B-Instruct-Preview-v0.1: A Large-Scale RWKV Model

recursal/QRWKV6-32B-Instruct-Preview-v0.1 is a 32 billion parameter instruction-tuned model developed by Recursal, showcasing a significant advancement in the RWKV (Receptance Weighted Key Value) architecture. This model is notable for its unique approach: it's derived from a QKV Attention-based model (specifically Qwen2.5-32B-Instruct) through a conversion process, rather than being trained from scratch. This method allows for rapid validation of the efficient RWKV linear attention mechanism at a larger scale.

Key Capabilities and Differentiators

  • Computational Efficiency: RWKV linear models are designed to drastically reduce computational costs, particularly for long context lengths, offering over a 1000x improvement in inference cost efficiency compared to traditional transformer architectures.
  • Performance: Evaluation benchmarks indicate that QRWKV6-32B-Instruct performs comparably to or even surpasses its base model, Qwen2.5-32B-Instruct, across several metrics like arc_challenge, piqa, and sciq, while being slightly behind on MMLU and hellaSwag.
  • Architectural Innovation: It demonstrates the scalability and architectural design of RWKV, proving that QKV attention is not the sole essential component for high-performing LLMs.
  • Inherited Knowledge: The model inherits its inherent knowledge and dataset training from its "parent" Qwen model, supporting approximately 30 languages.

Limitations and Future Directions

  • Context Length: Due to compute constraints, the current model was trained up to a 16K token context length, though it is stable beyond this limit.
  • Inference Code: This specific model requires separate inference code due to the lack of RWKV-based channel mix and feedforward layers.

Recursal plans to release Q-RWKV-7 32B and LLaMA-RWKV-7 70B, along with detailed conversion methodologies and a paper, following the finalization of RWKV-7.