namespace-Pt/Llama-3-8B-Instruct-80K-QLoRA-Merged

Hugging Face
TEXT GENERATIONConcurrency Cost:1Model Size:8BQuant:FP8Ctx Length:8kPublished:Apr 30, 2024License:mitArchitecture:Transformer0.0K Open Weights Warm

Llama-3-8B-Instruct-80K-QLoRA-Merged is an 8 billion parameter instruction-tuned causal language model developed by namespace-Pt, extending the context length of Meta's Llama-3-8B-Instruct to 80,000 tokens. This model was efficiently trained using QLoRA and 3.5K GPT-4 synthesized long-context data, demonstrating strong performance on long-context evaluation benchmarks like LongBench and InfiniteBench. It is optimized for tasks requiring extensive context understanding and generation, while maintaining competitive short-context capabilities.

Loading preview...

Overview

Llama-3-8B-Instruct-80K-QLoRA-Merged is an 8 billion parameter instruction-tuned language model that significantly extends the context window of the base Meta Llama-3-8B-Instruct model to 80,000 tokens. This extension was achieved through an efficient QLoRA training process, utilizing 3.5K long-context training data synthesized from GPT-4, completing in just 8 hours on an 8xA800 (80G) machine.

Key Capabilities & Performance

This model excels in tasks requiring deep understanding and generation over long contexts, as evidenced by its performance on several benchmarks:

  • Long-Context Understanding: Achieves strong results on the Needle-In-A-Haystack task, demonstrating robust retrieval across its 80K context window.
  • LongBench: Outperforms the original Llama-3-8B-Instruct and gradientai/Llama-3-8B-Instruct-262k across most categories, including Single-Doc QA, Multi-Doc QA, Summarization, and Synthetic tasks, with an average score of 47.19.
  • InfiniteBench: Shows superior performance in LongBookQA English with a score of 30.92, significantly higher than GPT-4 and other Llama-3 variants.
  • Topic Retrieval: Demonstrates effective topic retrieval capabilities across varying numbers of topics.
  • Short-Context Performance: Maintains competitive short-context capabilities, scoring 64.44 on the MMLU benchmark, comparable to other 8B models.

Use Cases

This model is particularly well-suited for applications that demand processing and generating content based on very long inputs, such as:

  • Document Analysis: Summarizing, querying, or extracting information from extensive reports, legal documents, or research papers.
  • Conversational AI: Maintaining context over prolonged dialogues or complex interactions.
  • Creative Writing: Generating long-form content with consistent narrative and context.
  • Code Analysis: Understanding and generating code within large repositories or complex projects.

Popular Sampler Settings

Top 3 parameter combinations used by Featherless users for this model. Click a tab to see each config.

temperature
top_p
top_k
frequency_penalty
presence_penalty
repetition_penalty
min_p