himalaya-ai/himalaya-gemma-4-e2b-it
The himalaya-ai/himalaya-gemma-4-e2b-it model is a fully fine-tuned Gemma-based language model developed by himalaya-ai. This model is distinguished by its full-parameter SFT training, enabling high-quality text generation in both English and Nepali. It was trained on a balanced dataset comprising Nepali and English data, making it particularly effective for bilingual applications. This model is optimized for understanding and generating content in these two languages.
Loading preview...
Model Overview
The himalaya-ai/himalaya-gemma-4-e2b-it is a fully fine-tuned Gemma-based language model developed by himalaya-ai, specifically designed for high-quality text generation in both English and Nepali. Unlike standard QLoRA methods, this model underwent Full-Parameter SFT (Supervised Fine-Tuning), meaning every parameter was trainable. This comprehensive training approach, spanning approximately 125,000 steps, was achieved on a single A100 GPU using an 8-bit AdamW optimizer and Gradient Checkpointing to manage memory.
Key Capabilities
- Bilingual Proficiency: Excels in understanding and generating text in both English and Nepali.
- Full-Parameter SFT: Benefits from a more thorough fine-tuning process compared to QLoRA, potentially leading to higher performance.
- High-Quality Data: Trained on a 50/50 mix of two high-quality datasets:
himalaya-ai/nepali-sft-datasetfor Nepali andteknium/OpenHermes-2.5for English.
Use Cases
This model is particularly well-suited for applications requiring robust language understanding and generation in a bilingual English-Nepali context. Developers can leverage it for tasks such as:
- Content creation in Nepali and English.
- Bilingual chatbots or virtual assistants.
- Translation-related tasks (though not explicitly a translation model).
- Any application where strong performance across these two languages is critical.