RWKV/v5-Eagle-7B-HF
TEXT GENERATIONConcurrency Cost:1Model Size:7BQuant:FP8Ctx Length:16kPublished:Jan 29, 2024License:apache-2.0Architecture:Transformer0.1K Open Weights Cold

RWKV/v5-Eagle-7B-HF is a 7 billion parameter causal language model developed by RWKV, implemented for the Hugging Face Transformers library. This model is based on the RWKV-5 Eagle architecture, which combines the advantages of RNNs with the performance of Transformers, offering efficient inference. It is a base model, not instruction-tuned, and is suitable for tasks requiring a powerful, efficient language model with a 16384 token context length.

Loading preview...

RWKV-5 Eagle 7B for Hugging Face Transformers

This model is the Hugging Face Transformers implementation of the RWKV-5 Eagle 7B architecture. RWKV models are known for their unique approach, blending the parallelizable training of Transformers with the efficient inference of Recurrent Neural Networks (RNNs). This particular version is a 7 billion parameter model, offering a substantial context length of 16384 tokens.

Key Characteristics

  • Architecture: Utilizes the RWKV-5 Eagle architecture, designed for both efficient training and inference.
  • Hugging Face Integration: Specifically packaged for seamless use with the Hugging Face Transformers library, simplifying deployment and experimentation.
  • Base Model: It is a base model, meaning it has not been instruction-tuned. This provides flexibility for developers to fine-tune it for specific applications.
  • Context Length: Features a notable context window of 16384 tokens, allowing it to process and generate longer sequences of text.

Usage Notes

  • Not Instruction-Tuned: Users should be aware that this is a base model. For conversational or instruction-following tasks, further fine-tuning or prompt engineering may be required.
  • Efficient Inference: The RWKV architecture generally offers advantages in inference speed and memory usage compared to traditional Transformer models of similar size, especially for long sequences.
  • Example Code: The README provides clear Python examples for running inference on both CPU and GPU, including batch inference capabilities, demonstrating how to use the model with the AutoModelForCausalLM and AutoTokenizer classes.