Fanar-2-27B-Instruct: Advanced Arabic-English LLM
Fanar-2-27B-Instruct is a 27 billion parameter, instruction-tuned large language model developed by the Qatar Computing Research Institute (QCRI) at HBKU. As part of the Fanar 2.0 release, this model builds upon the google/gemma-3-27b-pt base, continually pretrained on approximately 166 billion Arabic and English tokens using a novel three-recipe training approach with model merging. It supports Modern Standard Arabic (MSA) and diverse Arabic dialects, and is meticulously aligned with Islamic values and Arabic culture.
Key Capabilities
- Native Arabic Reasoning Traces: Generates multi-step reasoning natively in Arabic using
<think>...</think> blocks, trained on ~250K Arabic reasoning examples. - Tool Calling: Supports generic tool use and integrates with 10 internal Fanar tools for enhanced functionality.
- Advanced Hallucination Mitigation: Reduces hallucinations through knowledge probing, 5-step structured verification traces, and calibrated abstention responses, explicitly stating "I don't know" when uncertain.
- Quranic Verse Encapsulation: Automatically wraps spontaneous Quranic verse references in validation markers for downstream verification.
- Extended Context Length: Features an 8x longer context window of 32,768 tokens compared to its predecessor, Fanar 1.0.
Good for
- Applications requiring high-performance Arabic and English language understanding and generation.
- Use cases demanding culturally aligned responses, particularly within Islamic and Arabic contexts.
- Tasks benefiting from advanced reasoning capabilities and tool integration.
- Scenarios where hallucination mitigation and factual accuracy are critical.
- Developers seeking a robust, bilingual LLM with a focus on Arabic linguistic richness and cultural nuance.