RWKV/v5-Eagle-7B-pth

Cold
Public
7B
FP8
16384
Jan 28, 2024
License: apache-2.0
Hugging Face
Overview

RWKV-v5 Eagle 7B: An Efficient, Multilingual Foundation Model

RWKV/v5-Eagle-7B-pth is a 7.52 billion parameter model leveraging the innovative RWKV-v5 architecture. This architecture is notable for being an "Attention-Free Transformer," which contributes to 10-100x+ lower inference costs compared to traditional transformers, making it one of the world's greenest 7B models per token.

Key Capabilities & Characteristics

  • Efficient Architecture: Built on the RWKV-v5 linear transformer, offering significant inference cost reductions.
  • Extensive Training Data: Trained on 1.1 trillion tokens, comprising 70% English, 15% multi-language, and 15% code data.
  • Multilingual Prowess: Demonstrates superior performance in multi-lingual benchmarks compared to other 7B class models.
  • Competitive English Performance: Achieves performance levels comparable to larger models like Falcon (1.5T tokens), LLaMA2 (2T tokens), and Mistral (>2T tokens) in English evaluations, and trades blows with MPT-7B (1T tokens).
  • Foundation Model: This is a base model, and while it includes a very small instruct tune, further fine-tuning is recommended for specific use cases.

Good For

  • Applications requiring a highly efficient and environmentally friendly 7B parameter model.
  • Tasks benefiting from strong multilingual capabilities.
  • Developers looking for a powerful foundation model to fine-tune for specialized applications, especially where inference cost is a critical factor.