rachmanino/SelfExtended

TEXT GENERATIONConcurrency Cost:1Model Size:7BQuant:FP8Ctx Length:4kPublished:Apr 2, 2024License:mitArchitecture:Transformer Open Weights Cold

rachmanino/SelfExtended is a 7 billion parameter language model based on the Meta Llama-2-7b-chat-bf architecture. It incorporates the SelfExtend technique, implemented with Flash Attention, to significantly enhance its performance on tasks requiring longer context windows. This model is specifically optimized for applications demanding efficient processing of extended input sequences.

Loading preview...

Overview

rachmanino/SelfExtended is a 7 billion parameter language model derived from the Meta Llama-2-7b-chat-bf base model. Its core innovation lies in the integration of the SelfExtend technique, which is implemented using Flash Attention. This modification is specifically designed to improve the model's ability to handle and process longer input contexts more effectively.

Key Capabilities

  • Extended Context Handling: Utilizes the SelfExtend trick to process longer sequences of text.
  • Performance Optimization: Leverages Flash Attention for efficient implementation of the SelfExtend technique.
  • Llama-2 Base: Benefits from the foundational capabilities of the Llama-2-7b-chat-bf architecture.

Good For

  • Applications requiring robust performance on tasks with extensive context lengths.
  • Scenarios where efficient processing of long documents or conversations is crucial.