Overview
hishab/titulm-llama-3.2-1b-v1.0 is a 1.23 billion parameter model based on the Llama 3.2 architecture, continually pre-trained by Hishab. Its primary differentiator is its specialization in the Bangla language, achieved through extensive pretraining on a curated 268 GB Bangla text corpus, totaling 6 billion tokens. This process significantly enhances its ability to generate high-quality Bangla text and improve understanding.
Key Capabilities
- Superior Bangla Text Generation: Optimized for producing fluent and contextually relevant Bangla text.
- Enhanced Bangla Language Understanding: Demonstrates improved performance on various Bangla evaluation benchmarks.
- Multilingual Support: Primarily supports Bengali, with secondary capabilities in English.
- Grouped-Query Attention (GQA): Utilizes GQA for improved inference scalability, a feature inherited from the Llama 3.2 family.
Good for
- Bangla Text Generation: Ideal for applications requiring natural and accurate text output in Bengali.
- Bangla Language Understanding Tasks: Suitable for tasks like question answering, summarization, and sentiment analysis in Bengali.
- Bangla Instruction Fine-tuning: Can be further fine-tuned for specific instruction-following tasks in the Bangla language.
Performance Highlights
Compared to the base llama-3.2-1b model, titulm-llama-3.2-1b-v1.0 shows stronger performance on several Bangla benchmarks:
- Achieves 0.31 in Commonsense QA BN (5-shot) compared to 0.23.
- Scores 0.34 in OpenBook QA BN (5-shot) compared to 0.31.
- Reaches 0.57 in PIQA BN (5-shot) compared to 0.54.
It is important to note that while excelling in Bangla, its English benchmark scores are generally lower than the base llama-3.2-1b model, as expected due to its specialized Bangla training.