namespace-Pt/Llama-3-8B-Instruct-80K-QLoRA-Merged
Llama-3-8B-Instruct-80K-QLoRA-Merged is an 8 billion parameter instruction-tuned causal language model developed by namespace-Pt, extending the context length of Meta's Llama-3-8B-Instruct to 80,000 tokens. This model was efficiently trained using QLoRA and 3.5K GPT-4 synthesized long-context data, demonstrating strong performance on long-context evaluation benchmarks like LongBench and InfiniteBench. It is optimized for tasks requiring extensive context understanding and generation, while maintaining competitive short-context capabilities.
Loading preview...
Overview
Llama-3-8B-Instruct-80K-QLoRA-Merged is an 8 billion parameter instruction-tuned language model that significantly extends the context window of the base Meta Llama-3-8B-Instruct model to 80,000 tokens. This extension was achieved through an efficient QLoRA training process, utilizing 3.5K long-context training data synthesized from GPT-4, completing in just 8 hours on an 8xA800 (80G) machine.
Key Capabilities & Performance
This model excels in tasks requiring deep understanding and generation over long contexts, as evidenced by its performance on several benchmarks:
- Long-Context Understanding: Achieves strong results on the Needle-In-A-Haystack task, demonstrating robust retrieval across its 80K context window.
- LongBench: Outperforms the original Llama-3-8B-Instruct and gradientai/Llama-3-8B-Instruct-262k across most categories, including Single-Doc QA, Multi-Doc QA, Summarization, and Synthetic tasks, with an average score of 47.19.
- InfiniteBench: Shows superior performance in LongBookQA English with a score of 30.92, significantly higher than GPT-4 and other Llama-3 variants.
- Topic Retrieval: Demonstrates effective topic retrieval capabilities across varying numbers of topics.
- Short-Context Performance: Maintains competitive short-context capabilities, scoring 64.44 on the MMLU benchmark, comparable to other 8B models.
Use Cases
This model is particularly well-suited for applications that demand processing and generating content based on very long inputs, such as:
- Document Analysis: Summarizing, querying, or extracting information from extensive reports, legal documents, or research papers.
- Conversational AI: Maintaining context over prolonged dialogues or complex interactions.
- Creative Writing: Generating long-form content with consistent narrative and context.
- Code Analysis: Understanding and generating code within large repositories or complex projects.
Top 3 parameter combinations used by Featherless users for this model. Click a tab to see each config.