bhenrym14/airophin-13b-pntk-16k-fp16
bhenrym14/airophin-13b-pntk-16k-fp16 is a 13 billion parameter QLoRA fine-tune of Llama-2-13b, developed by bhenrym14. It is specifically designed to extend the useful context window up to 16384 tokens through Partial NTK Rope Scaling. This model demonstrates competitive perplexity scores at extended context lengths, outperforming its base Llama-2-13b counterpart and some 33B models in long-context scenarios. Its primary strength lies in its ability to handle significantly longer contexts while maintaining performance, making it suitable for applications requiring extensive document processing or conversational history.
Loading preview...
Overview
bhenrym14/airophin-13b-pntk-16k-fp16 is a 13 billion parameter QLoRA fine-tune of the Llama-2-13b model, engineered to significantly extend its effective context window to 16384 tokens. This is achieved through a two-phase training process involving a long-context subset of the Dolphin dataset and Jon Durbin's Airoboros GPT4 1.4.1 dataset, both utilizing Partial NTK Rope Scaling.
Key Capabilities & Features
- Extended Context Window: Designed for robust performance up to 16384 tokens, a substantial increase over the base Llama-2-13b's 4096 tokens.
- Partial NTK Rope Scaling: Implements an advanced scaling method to improve long-context understanding and reduce "extrapolated deemphasis" of middle-context tokens.
- Competitive Perplexity: Achieves lower perplexity scores at extended contexts (e.g., 4.82 at 12000 tokens) compared to other 13B models and even some 33B extended context variants.
- Instruction Following: Retains Airoboros-like prompting for obedient question answering, coding, writing, and multi-character conversations, including specific formatting for closed-context instructions.
- Performance Preservation: Shows no clear performance regression on benchmarks like MMLU-fs (scoring 54.9) despite the context extension, indicating effective scaling without sacrificing core capabilities.
Use Cases
This model is particularly well-suited for applications that demand processing or generating content with very long context dependencies. This includes tasks such as:
- Summarizing lengthy documents or articles.
- Engaging in extended, context-aware conversations.
- Code generation and analysis where large codebases or detailed requirements are provided.
- Any scenario where maintaining coherence and understanding over thousands of tokens is critical.