Overview

Llama-3-8B-Instruct-80K-QLoRA-Merged is an 8 billion parameter instruction-tuned language model that significantly extends the context window of the base Meta Llama-3-8B-Instruct model to 80,000 tokens. This extension was achieved through an efficient QLoRA training process, utilizing 3.5K long-context training data synthesized from GPT-4, completing in just 8 hours on an 8xA800 (80G) machine.

Key Capabilities & Performance

This model excels in tasks requiring deep understanding and generation over long contexts, as evidenced by its performance on several benchmarks:

Long-Context Understanding: Achieves strong results on the Needle-In-A-Haystack task, demonstrating robust retrieval across its 80K context window.
LongBench: Outperforms the original Llama-3-8B-Instruct and gradientai/Llama-3-8B-Instruct-262k across most categories, including Single-Doc QA, Multi-Doc QA, Summarization, and Synthetic tasks, with an average score of 47.19.
InfiniteBench: Shows superior performance in LongBookQA English with a score of 30.92, significantly higher than GPT-4 and other Llama-3 variants.
Topic Retrieval: Demonstrates effective topic retrieval capabilities across varying numbers of topics.
Short-Context Performance: Maintains competitive short-context capabilities, scoring 64.44 on the MMLU benchmark, comparable to other 8B models.

Use Cases

This model is particularly well-suited for applications that demand processing and generating content based on very long inputs, such as:

Document Analysis: Summarizing, querying, or extracting information from extensive reports, legal documents, or research papers.
Conversational AI: Maintaining context over prolonged dialogues or complex interactions.
Creative Writing: Generating long-form content with consistent narrative and context.
Code Analysis: Understanding and generating code within large repositories or complex projects.

Overview

Overview

Key Capabilities & Performance

Use Cases

Full Model Card (README)