RWKV/v5-Eagle-7B-pth

TEXT GENERATIONConcurrency Cost:1Model Size:7BQuant:FP8Ctx Length:16kPublished:Jan 28, 2024License:apache-2.0Architecture:Transformer0.2K Open Weights Cold

RWKV/v5-Eagle-7B-pth is a 7.52 billion parameter model built on the RWKV-v5 architecture, a linear transformer designed for significantly lower inference costs. Trained on 1.1 trillion tokens across over 100 languages, it excels in multi-lingual benchmarks, outperforming other 7B class models. This foundation model approaches the performance of larger transformer models like Falcon and LLaMA2 in English evaluations while being an "Attention-Free Transformer."

Loading preview...

RWKV-v5 Eagle 7B: An Efficient, Multilingual Foundation Model

RWKV/v5-Eagle-7B-pth is a 7.52 billion parameter model leveraging the innovative RWKV-v5 architecture. This architecture is notable for being an "Attention-Free Transformer," which contributes to 10-100x+ lower inference costs compared to traditional transformers, making it one of the world's greenest 7B models per token.

Key Capabilities & Characteristics

  • Efficient Architecture: Built on the RWKV-v5 linear transformer, offering significant inference cost reductions.
  • Extensive Training Data: Trained on 1.1 trillion tokens, comprising 70% English, 15% multi-language, and 15% code data.
  • Multilingual Prowess: Demonstrates superior performance in multi-lingual benchmarks compared to other 7B class models.
  • Competitive English Performance: Achieves performance levels comparable to larger models like Falcon (1.5T tokens), LLaMA2 (2T tokens), and Mistral (>2T tokens) in English evaluations, and trades blows with MPT-7B (1T tokens).
  • Foundation Model: This is a base model, and while it includes a very small instruct tune, further fine-tuning is recommended for specific use cases.

Good For

  • Applications requiring a highly efficient and environmentally friendly 7B parameter model.
  • Tasks benefiting from strong multilingual capabilities.
  • Developers looking for a powerful foundation model to fine-tune for specialized applications, especially where inference cost is a critical factor.

Popular Sampler Settings

Top 3 parameter combinations used by Featherless users for this model. Click a tab to see each config.

temperature
top_p
top_k
frequency_penalty
presence_penalty
repetition_penalty
min_p