self-long/SelfLong-Llama3.2-1B-Instruct-1M
SelfLong-Llama3.2-1B-Instruct-1M is a 1 billion parameter instruction-tuned language model from the SelfLong series, initialized from the Llama-3.2 architecture. Developed by Wang et al., this model is specifically engineered to handle extremely long contexts, supporting up to 1 million tokens. It excels in tasks requiring extensive context understanding, as demonstrated by its performance on the RULER-1M benchmark.
Loading preview...
Overview
SelfLong-Llama3.2-1B-Instruct-1M is a 1 billion parameter instruction-tuned model, part of the SelfLong series, designed for processing exceptionally long contexts. Based on the Llama-3.2 architecture, this model is distinguished by its ability to manage up to 1 million tokens, making it suitable for applications requiring deep contextual understanding.
Key Capabilities
- Extreme Context Length: Supports an impressive context window of up to 1 million tokens, significantly surpassing many conventional LLMs.
- Instruction Following: Optimized for instruction-based tasks, leveraging its Llama-3.2-Instruct foundation.
- Long-Context Reasoning: Evaluated and shown to perform effectively on the RULER-1M benchmark, which assesses long-context understanding across various support lengths.
Performance Highlights
On the RULER-1M benchmark, SelfLong-1B-1M demonstrates its long-context capabilities, achieving a RULER score of 31.1 at the 1M token support length. While larger SelfLong models (3B and 8B) show higher scores, this 1B variant provides a compact option for long-context applications.
Good For
- Applications requiring processing and understanding of very long documents or conversations.
- Tasks like summarization, question answering, or information extraction from extensive texts.
- Developers seeking a smaller, efficient model capable of handling large context windows.