Sakalti/light-3B-beta
Sakalti/light-3B-beta is a 3.1 billion parameter language model built by Sakalti, merged using the TIES method with Qwen/Qwen2.5-3B as its base. This model integrates the capabilities of Qwen/Qwen2.5-3B-Instruct, offering a compact yet capable solution for general language tasks with a 32K context length.
Loading preview...
Model Overview
Sakalti/light-3B-beta is a 3.1 billion parameter language model created by Sakalti through a merge process using mergekit. It leverages the robust foundation of Qwen's architecture, specifically using Qwen/Qwen2.5-3B as its base model.
Merge Details
This model was constructed using the TIES merge method, a technique designed to combine the strengths of multiple pre-trained language models. The primary component integrated into this merge is Qwen/Qwen2.5-3B-Instruct, indicating an emphasis on instruction-following capabilities. The merge configuration utilized a weight and density of 1 for the included model, with int8_mask enabled and bfloat16 as the data type.
Key Characteristics
- Compact Size: At 3.1 billion parameters, it offers a balance between performance and computational efficiency.
- Instruction-Tuned Foundation: Inherits instruction-following abilities from Qwen/Qwen2.5-3B-Instruct.
- Extended Context Window: Supports a context length of 32,768 tokens, allowing for processing longer inputs and maintaining conversational coherence over extended interactions.
Good For
- Applications requiring a smaller, efficient language model with instruction-following capabilities.
- Tasks benefiting from a substantial context window, such as summarization of longer texts or complex question answering.
- General natural language processing tasks where the strengths of the Qwen 2.5 series are desired in a merged format.