hypaai/Hypa-Llama3.1-8b-SFT
Hypa-Llama3.1 8B by Hypa Intelligence is a LoRA-merged supervised fine-tune of Meta's Llama 3.1 8B, specifically designed for multilingual and tool-aware applications. This 8.7 billion parameter model excels in translation, language detection, and dictionary-style explanations across 17 languages, including 14 low-resource Nigerian languages. It features a 2,048-token training context and supports explicit reasoning channels and structured JSON output, making it ideal for robust multilingual instruction-following and translation tasks.
Loading preview...
Hypa-Llama3.1 8B: Multilingual & Tool-Aware Fine-Tune
Hypa-Llama3.1 8B is an 8.7 billion parameter model from Hypa Intelligence, built upon Meta's Llama 3.1 8B. It's a LoRA-merged supervised fine-tune, inheriting capabilities from prior Hypa-Llama checkpoints and layering new prompt families. The model is distinguished by its focus on multilingual support for 17 languages, including English, French, Spanish, and 14 low-resource Nigerian languages (e.g., Annang, Ebira, Idoma, Igbo, Yoruba), many of which are underrepresented in large-scale fine-tuning corpora.
Key Capabilities
- Multilingual Translation: Specializes in translation between English/French/Spanish and the 14 covered low-resource languages.
- Language Detection: Accurately identifies all 17 supported languages.
- Dictionary-style Explanations: Provides lexical lookups and explanations, supporting both Markdown and strict JSON output modes for programmatic use.
- Tool-Awareness: Incorporates tool-calling-style prompting, inheriting Llama 3.1's native structure.
- Reasoning Channel: Features an explicit
<|think>reasoning channel for translation correction and breakdown, emitting a<think>...</think>block before the final answer. - Instruction Following: Excels at multilingual instruction-following for dialogue tasks.
Training and Performance
The model was trained on 17.0 million examples across multilingual instruction sub-datasets, using LoRA (r=256, α=256) via Unsloth and QLoRA. It demonstrated clean training dynamics with a final training loss of 0.213 and evaluation loss of 0.330. Qualitative observations show meaningful improvements over the base Llama 3.1 8B-Instruct, particularly for the smallest languages where the base model was largely unusable. The training context window was 2,048 tokens, though the config advertises 128K.
Good For
- Applications requiring high-quality translation for low-resource languages.
- Developing multilingual chatbots or agents that need to understand and generate content in diverse languages.
- Tasks involving structured data output (e.g., JSON) for dictionary lookups or programmatic interactions.
- As a starting point for further fine-tuning on specialized tasks within the supported languages, or for adapter stacking.
- Replacing
meta-llama/Llama-3.1-8B-Instructin pipelines needing improved low-resource language quality.