Sakalti/light-3B-beta

TEXT GENERATIONConcurrency Cost:1Model Size:3.1BQuant:BF16Ctx Length:32kPublished:Jan 14, 2025License:qwen-researchArchitecture:Transformer0.0K Cold

Sakalti/light-3B-beta is a 3.1 billion parameter language model built by Sakalti, merged using the TIES method with Qwen/Qwen2.5-3B as its base. This model integrates the capabilities of Qwen/Qwen2.5-3B-Instruct, offering a compact yet capable solution for general language tasks with a 32K context length.

Loading preview...

Model Overview

Sakalti/light-3B-beta is a 3.1 billion parameter language model created by Sakalti through a merge process using mergekit. It leverages the robust foundation of Qwen's architecture, specifically using Qwen/Qwen2.5-3B as its base model.

Merge Details

This model was constructed using the TIES merge method, a technique designed to combine the strengths of multiple pre-trained language models. The primary component integrated into this merge is Qwen/Qwen2.5-3B-Instruct, indicating an emphasis on instruction-following capabilities. The merge configuration utilized a weight and density of 1 for the included model, with int8_mask enabled and bfloat16 as the data type.

Key Characteristics

  • Compact Size: At 3.1 billion parameters, it offers a balance between performance and computational efficiency.
  • Instruction-Tuned Foundation: Inherits instruction-following abilities from Qwen/Qwen2.5-3B-Instruct.
  • Extended Context Window: Supports a context length of 32,768 tokens, allowing for processing longer inputs and maintaining conversational coherence over extended interactions.

Good For

  • Applications requiring a smaller, efficient language model with instruction-following capabilities.
  • Tasks benefiting from a substantial context window, such as summarization of longer texts or complex question answering.
  • General natural language processing tasks where the strengths of the Qwen 2.5 series are desired in a merged format.