HebArabNlpProject/Hebatron_base_long
HEBATRON is a 31.6 billion parameter language model developed by PwC Israel, MAFAT, and AWS, specialized for Hebrew. It features a unique hybrid Mamba2 and Mixture-of-Experts (MoE) architecture, providing linear scaling for long-context tasks up to 65,536 tokens. Optimized for native-level reasoning in both Hebrew and English, it excels in advanced Hebrew document analysis and complex bilingual reasoning.
Loading preview...
HEBATRON: Hebrew-Specialized Mamba2-MoE
HEBATRON is a state-of-the-art language model developed through a collaboration between PwC Israel, MAFAT, and AWS, specifically designed for the Hebrew language. It introduces a unique hybrid architecture combining Mamba2 and Mixture-of-Experts (MoE), making it a localized and enhanced version of the Nemotron-3-Nano-30B framework.
Key Capabilities
- Hybrid Architecture: Combines Mamba2 (SSM) and Sparse MoE for efficient processing.
- Bilingual Proficiency: Optimized for native-level reasoning in both Hebrew and English.
- Long Context Window: Supports a 65,536 (64k) token context window, ideal for extensive documents.
- Specialized Training: Utilizes a three-phase curriculum learning strategy, including formal, colloquial, and long-context data, to handle Hebrew's structural and morphological complexities.
- Strong Performance: Achieves 91.2% on Hebrew SNLI, 83.3% on GSM8K (Math) in native Hebrew, and 91.6% on English Psychometric Psi.
Intended Use Cases
- Advanced Hebrew document analysis.
- Long-context summarization for legal and technical texts.
- Complex bilingual reasoning tasks.