HebArabNlpProject/Hebatron_base
HebArabNlpProject/Hebatron_base is a 31.6 billion parameter language model developed by PwC Israel, MAFAT, and AWS, specialized for the Hebrew language. It features a unique hybrid Mamba2 and Mixture-of-Experts (MoE) architecture, optimized for native-level reasoning in Hebrew and English. With a 32768 token context window, it excels at advanced Hebrew document analysis and long-context summarization. The model was trained using a three-phase curriculum learning strategy to handle Hebrew's structural and morphological complexities.
Loading preview...
HEBATRON: Hebrew-Specialized Mamba2-MoE
HEBATRON is a 31.6 billion parameter language model developed by PwC Israel, MAFAT, and AWS, specifically designed for the Hebrew language. It utilizes a unique hybrid architecture combining Mamba2 (SSM) and Sparse Mixture-of-Experts (MoE), providing linear scaling for long-context tasks up to 32768 tokens. This model is an enhanced version of the Nemotron-3-Nano-30B framework, optimized for native-level reasoning in both Hebrew and English.
Key Capabilities & Features
- Hybrid Architecture: Integrates Mamba2 and MoE for efficient processing of Hebrew's complex morphology and long contexts.
- Curriculum Learning: Trained in three phases, starting with formal Hebrew, expanding to colloquial language, and fine-tuning for long-context understanding.
- High Performance: Achieves 91.2% on Hebrew SNLI, 72.1% on Israeli Trivia, and 83.3% on GSM8K in native Hebrew, surpassing DictaLM-3.0-Thinking in average Hebrew reasoning.
- Bilingual Reasoning: Demonstrates strong performance in English reasoning benchmarks, including 91.6% on Psychometric Psi (EN).
- Technical Specifications: Features 31.6B total parameters with ~3B active parameters per token, and a context window of 8096 tokens (though the model card states 32768 tokens).
Intended Use Cases
- Advanced Hebrew Document Analysis: Ideal for processing complex legal, academic, and technical texts.
- Long-Context Summarization: Excels at summarizing extensive documents in Hebrew.
- Complex Bilingual Reasoning: Suitable for tasks requiring sophisticated understanding and generation in both Hebrew and English.