PJMixers-Dev/gemma-3-4b-pt-InitializedEmbeds

VISIONConcurrency Cost:1Model Size:4.3BQuant:BF16Ctx Length:32kPublished:Jun 4, 2025Architecture:Transformer Cold

PJMixers-Dev/gemma-3-4b-pt-InitializedEmbeds is a 4.3 billion parameter language model based on the Gemma architecture, created by PJMixers-Dev through a linear merge of Google's gemma-3-4b-pt and gemma-3-4b-it models. This model is specifically configured to leverage the pre-trained weights of gemma-3-4b-pt, making it suitable for tasks requiring a strong foundational understanding before further fine-tuning. Its 32K context length supports processing extensive inputs for various language generation and comprehension tasks.

Loading preview...

Model Overview

PJMixers-Dev/gemma-3-4b-pt-InitializedEmbeds is a 4.3 billion parameter language model built upon the Gemma architecture. It was created by PJMixers-Dev using the MergeKit tool, specifically employing the Linear merge method.

Merge Details

This model is a composite of two foundational Google Gemma models:

  • google/gemma-3-4b-pt: The primary pre-trained component, contributing 100% of its weights.
  • google/gemma-3-4b-it: Included in the merge configuration, but its weights were set to 0.0, indicating that its influence on the final model's parameters is negligible in this specific merge.

The tokenizer configuration is sourced from google/gemma-3-4b-it, ensuring compatibility with its tokenization scheme, including special tokens like <start_of_turn> and <end_of_turn>.

Key Characteristics

  • Gemma Architecture: Leverages the efficient and capable Gemma base model.
  • Pre-trained Focus: Primarily utilizes the pre-trained weights of gemma-3-4b-pt, making it a strong candidate for tasks that benefit from a robust foundational language understanding.
  • Linear Merge Method: The model's parameters are a direct linear combination of its constituents, with a strong emphasis on the pre-trained component.

Potential Use Cases

This model is well-suited for:

  • Further Fine-tuning: As it heavily relies on the pre-trained gemma-3-4b-pt, it provides an excellent starting point for domain-specific fine-tuning.
  • Research and Experimentation: Ideal for exploring the capabilities of the Gemma architecture with a focus on its pre-trained state.
  • General Language Understanding: Can be applied to various NLP tasks where a solid base model is required.