budecosystem/genz-13b-infinite
budecosystem/genz-13b-infinite is a 13 billion parameter language model, fine-tuned from GenZ-13B-v2, developed by BudEcosystem. It integrates lambda attention from the LM-Infinite paper, enabling an extended context length of 120K+ tokens without affecting perplexity. This model is optimized for handling very long sequences, making it suitable for tasks requiring extensive contextual understanding.
Loading preview...
budecosystem/genz-13b-infinite: Extended Context LLM
budecosystem/genz-13b-infinite is a 13 billion parameter language model, fine-tuned from GenZ-13B-v2, developed by BudEcosystem. Its key differentiator is the integration of lambda attention, inspired by the LM-Infinite paper, which significantly extends its effective context window to over 120,000 tokens while maintaining perplexity.
Key Capabilities & Features
- Extended Context: Achieves 120K+ token sequence length, a substantial increase over its base model's 16K context.
- Architecture: Utilizes lambda attention for efficient long-context processing.
- Training: Fine-tuned over 55 hours on 4 A100 80GB GPUs with specific hyperparameters including a learning rate of 2e-4 and 3 epochs.
Performance & Use Cases
While designed for long contexts, initial Passkey retrieval evaluations show varying performance with increasing context, achieving 100% at 4096 tokens but decreasing at longer lengths. This model is particularly well-suited for applications requiring deep contextual understanding over very long documents or conversations, where maintaining coherence and retrieving information from extensive inputs is critical.