jordiclive/scaled-llama-7b-lora-16k-rp2
The jordiclive/scaled-llama-7b-lora-16k-rp2 is a 7 billion parameter Llama-based language model, fine-tuned with LoRA on 16k sequences of the RedPajama dataset for one epoch. This model is notable for its extended context length of 16,384 tokens, achieved through linear scaled RoPE, making it suitable for tasks requiring processing longer inputs. It is designed for general language understanding and generation with an emphasis on handling extensive textual data.
Loading preview...
Model Overview
The jordiclive/scaled-llama-7b-lora-16k-rp2 is a 7 billion parameter Llama-based language model developed by jordiclive. It has been fine-tuned using LoRA (Low-Rank Adaptation) on the RedPajama dataset, specifically trained on packed 16k sequences for one epoch. This model integrates a linear scaled RoPE (Rotary Positional Embeddings) implementation, enabling it to effectively process significantly longer context windows than its base architecture.
Key Capabilities
- Extended Context Length: Supports a context window of up to 16,384 tokens, allowing for the processing of much larger documents and conversations.
- Llama Architecture: Built upon the robust Llama-7B foundation, providing strong general language understanding and generation capabilities.
- LoRA Fine-tuning: Utilizes LoRA for efficient fine-tuning, with the adapter parameters available separately if needed.
- RedPajama Dataset Training: Benefits from training on the RedPajama dataset, a large-scale open-source dataset designed to replicate the LLaMA training data.
Good For
- Applications requiring the analysis or generation of long-form text, such as summarization of lengthy articles, document question-answering, or extended dialogue systems.
- Developers looking for a Llama-7B variant optimized for handling larger input sequences without significant architectural changes.
- Research into the effects of scaled RoPE and extended context on Llama models.