SAIJO1233/Gemma3-1b-SFT_Teached
SAIJO1233/Gemma3-1b-SFT_Teached is a 1 billion parameter language model based on the Gemma architecture, created by SAIJO1233 through a TIES merge of two fine-tuned models. This model leverages google/gemma-3-1b-it as its base and is designed to combine the strengths of its merged components. With a context length of 32768 tokens, it offers enhanced capabilities derived from its unique merging strategy.
Loading preview...
Model Overview
SAIJO1233/Gemma3-1b-SFT_Teached is a 1 billion parameter language model developed by SAIJO1233. It is a merged model, built upon the google/gemma-3-1b-it base model, utilizing the TIES merge method.
Merge Details
This model was created by merging two distinct pre-trained language models:
./gemma_qwen3_FULL./gemma_qwen2.5_FULL
The TIES merge configuration assigned a dominant weight of 0.85 to ./gemma_qwen3_FULL (with 100% neuron density) and a weight of 0.15 to ./gemma_qwen2.5_FULL (with 30% neuron density for its most important changes). This specific merging strategy aims to combine the strengths of both constituent models, with one model contributing more significantly to the overall influence.
Key Characteristics
- Architecture: Based on the Gemma family, specifically
google/gemma-3-1b-it. - Parameter Count: 1 billion parameters.
- Context Length: Supports a substantial context window of 32768 tokens.
- Merge Method: Employs the TIES (Trimmed, Iterative, and Self-consistent) merging technique, which selectively combines parameters from multiple models.
Potential Use Cases
Given its merged nature and Gemma base, this model is suitable for tasks requiring a balance of capabilities from its constituent models, potentially excelling in areas where the dominant gemma_qwen3_FULL model is strong, augmented by contributions from gemma_qwen2.5_FULL.