SPAISS6F1/gemma-1b-pruned-th

VISIONConcurrency Cost:1Model Size:4.3BQuant:BF16Ctx Length:32kPublished:Jun 6, 2026License:apache-2.0Architecture:Transformer Open Weights Cold

SPAISS6F1/gemma-1b-pruned-th is a 2.7 billion parameter Thai language model derived from Google's Gemma-3-4b-pt (4.3B parameters) through depth pruning and subsequent healing SFT. This model retains 17 of the original 34 layers, focusing on initial and final layers to preserve core functionality. It is specifically optimized for generating fluent Thai grammar and serves as an excellent base for further instruction fine-tuning.

Loading preview...

Overview

SPAISS6F1/gemma-1b-pruned-th is a compact Thai language model created by applying Depth Pruning (Layer Dropping) to unsloth/gemma-3-4b-pt (a mirror of google/gemma-3-4b-pt). The original 4.3 billion parameter model (34 layers) was reduced to 2.7 billion parameters by retaining 17 specific layers (0-7 and 25-33), effectively cutting the middle layers while preserving the embedding and LM head. After pruning, the model underwent Healing SFT (Supervised Fine-Tuning) on the SEA-PILE v2 Thai dataset (~8,000 documents) to restore its capabilities and ensure coherent language generation.

Key Capabilities

  • Efficient Thai Language Generation: Achieves fluent Thai grammar despite significant parameter reduction.
  • Pruning Methodology: Utilizes a depth pruning technique that removes redundant middle decoder layers, followed by SFT to recover performance.
  • Base Model for Fine-tuning: Designed to be a strong foundation for further instruction-tuning with specific datasets.

Good For

  • Developing Thai-specific LLMs: Ideal as a base model for fine-tuning with instruction datasets to create specialized Thai language applications.
  • Resource-constrained Environments: Its reduced size (2.7B parameters) makes it more efficient than its 4.3B parameter base model.

Limitations

  • As a pruned base model healed with raw web corpus, it excels in grammar but may be weak in factual accuracy and mathematical reasoning.
  • Requires repetition_penalty >= 1.2 during inference to prevent repetitive outputs.
  • The 50% layer reduction is substantial; higher quality might be achieved with less aggressive pruning or longer healing SFT.