HebArabNlpProject/Hebatron
HEBATRON is a 31.6 billion parameter language model developed by PwC Israel and MAFAT, specialized for Hebrew. It features a unique hybrid Mamba2 and Mixture-of-Experts (MoE) architecture, optimized for native-level reasoning in Hebrew and English. With a 64k token context window, it excels at advanced Hebrew document analysis and long-context summarization.
Loading preview...
HEBATRON: Hebrew-Specialized Mamba2-MoE
HEBATRON is a state-of-the-art, high-performance language model developed by PwC Israel and MAFAT, in collaboration with AWS. It introduces a unique hybrid architecture combining Mamba2 and Mixture-of-Experts (MoE), building upon the Nemotron-3-Nano-30B framework. With 31.6 billion total parameters and approximately 3 billion active parameters per token, it is specifically optimized for handling the structural and morphological complexities of Hebrew.
Key Capabilities & Features
- Hybrid Architecture: Integrates Mamba2 (SSM) and Sparse MoE for efficient processing.
- Extended Context Window: Supports a 65,536 (64k) token context, enabling long-context tasks.
- Hebrew Specialization: Designed for native-level reasoning in Hebrew, with strong performance in English.
- Curriculum Learning: Trained using a three-phase strategy on 75.5B tokens of formal Hebrew, 3.36B tokens of colloquial data, and 20.4B tokens for long-context extension.
- Strong Performance: Achieves 91.2% on Hebrew SNLI, 73.8% on Hebrew Average Reasoning, and 83.3% on Hebrew GSM8K, alongside 86.0% on English Reasoning Average.
Intended Use Cases
- Advanced Hebrew document analysis.
- Long-context summarization for legal and technical texts.
- Complex bilingual reasoning tasks.