jmcinern/Qomhra-AWQ
jmcinern/Qomhra-AWQ is an 8-billion parameter, activation-aware quantized version of Qomhrá, a bilingual Large Language Model (LLM) developed by researchers at Trinity College Dublin, University College Cork, and Queen's University Belfast. Adapted from Qwen3-8B, it is specifically designed to support the low-resource Irish language (Gaeilge) while maintaining strong English capabilities, offering a 32768 token context length. This model excels in Irish language understanding and generation, outperforming existing open-source baselines in benchmarks like Cloze-gle, SIB-gle, and IQA-gle/eng.
Loading preview...
Qomhrá-AWQ: A Bilingual Irish & English LLM
Qomhrá-AWQ is an 8-billion parameter, activation-aware quantized model based on Qomhrá, developed by researchers at Trinity College Dublin, University College Cork, and Queen's University Belfast. It is adapted from Qwen3-8B and specifically engineered to support the low-resource Irish language (Gaeilge) alongside English, aiming to provide an open-weight alternative for the Irish language community.
Key Capabilities
- Bilingual Proficiency: Optimized for both Irish and English, maintaining strong English capabilities through a high mixture of English data during continued pre-training.
- Irish Language Excellence: Outperforms existing open-source baselines in Irish understanding and generation benchmarks, including grammatical gender (Cloze-gle), topic modeling (SIB-gle), and question answering (IQA-gle).
- Robust Training: Developed using a two-stage pipeline: Bilingual Continued Pre-Training (CPT) on a 3.265 billion character corpus (75% Irish, 25% English) and Instruction Tuning with a 30k sample parallel English-Irish dataset.
Good For
- Applications requiring strong performance in Irish language processing.
- Use cases demanding bilingual (Irish-English) text generation and understanding.
- Developers seeking an open-source LLM for low-resource language support.