Overview
RWKV-v5 Eagle 7B: An Efficient, Multilingual Foundation Model
RWKV/v5-Eagle-7B-pth is a 7.52 billion parameter model leveraging the innovative RWKV-v5 architecture. This architecture is notable for being an "Attention-Free Transformer," which contributes to 10-100x+ lower inference costs compared to traditional transformers, making it one of the world's greenest 7B models per token.
Key Capabilities & Characteristics
- Efficient Architecture: Built on the RWKV-v5 linear transformer, offering significant inference cost reductions.
- Extensive Training Data: Trained on 1.1 trillion tokens, comprising 70% English, 15% multi-language, and 15% code data.
- Multilingual Prowess: Demonstrates superior performance in multi-lingual benchmarks compared to other 7B class models.
- Competitive English Performance: Achieves performance levels comparable to larger models like Falcon (1.5T tokens), LLaMA2 (2T tokens), and Mistral (>2T tokens) in English evaluations, and trades blows with MPT-7B (1T tokens).
- Foundation Model: This is a base model, and while it includes a very small instruct tune, further fine-tuning is recommended for specific use cases.
Good For
- Applications requiring a highly efficient and environmentally friendly 7B parameter model.
- Tasks benefiting from strong multilingual capabilities.
- Developers looking for a powerful foundation model to fine-tune for specialized applications, especially where inference cost is a critical factor.