hypaai/Hypa-Llama3.1-8b-SFT

Hugging Face
TEXT GENERATIONConcurrency Cost:1Model Size:8BQuant:FP8Ctx Length:8kPublished:May 6, 2026License:apache-2.0Architecture:Transformer Open Weights Warm

Hypa-Llama3.1 8B by Hypa Intelligence is a LoRA-merged supervised fine-tune of Meta's Llama 3.1 8B, specifically designed for multilingual and tool-aware applications. This 8.7 billion parameter model excels in translation, language detection, and dictionary-style explanations across 17 languages, including 14 low-resource Nigerian languages. It features a 2,048-token training context and supports explicit reasoning channels and structured JSON output, making it ideal for robust multilingual instruction-following and translation tasks.

Loading preview...

Hypa-Llama3.1 8B: Multilingual & Tool-Aware Fine-Tune

Hypa-Llama3.1 8B is an 8.7 billion parameter model from Hypa Intelligence, built upon Meta's Llama 3.1 8B. It's a LoRA-merged supervised fine-tune, inheriting capabilities from prior Hypa-Llama checkpoints and layering new prompt families. The model is distinguished by its focus on multilingual support for 17 languages, including English, French, Spanish, and 14 low-resource Nigerian languages (e.g., Annang, Ebira, Idoma, Igbo, Yoruba), many of which are underrepresented in large-scale fine-tuning corpora.

Key Capabilities

  • Multilingual Translation: Specializes in translation between English/French/Spanish and the 14 covered low-resource languages.
  • Language Detection: Accurately identifies all 17 supported languages.
  • Dictionary-style Explanations: Provides lexical lookups and explanations, supporting both Markdown and strict JSON output modes for programmatic use.
  • Tool-Awareness: Incorporates tool-calling-style prompting, inheriting Llama 3.1's native structure.
  • Reasoning Channel: Features an explicit <|think> reasoning channel for translation correction and breakdown, emitting a <think>...</think> block before the final answer.
  • Instruction Following: Excels at multilingual instruction-following for dialogue tasks.

Training and Performance

The model was trained on 17.0 million examples across multilingual instruction sub-datasets, using LoRA (r=256, α=256) via Unsloth and QLoRA. It demonstrated clean training dynamics with a final training loss of 0.213 and evaluation loss of 0.330. Qualitative observations show meaningful improvements over the base Llama 3.1 8B-Instruct, particularly for the smallest languages where the base model was largely unusable. The training context window was 2,048 tokens, though the config advertises 128K.

Good For

  • Applications requiring high-quality translation for low-resource languages.
  • Developing multilingual chatbots or agents that need to understand and generate content in diverse languages.
  • Tasks involving structured data output (e.g., JSON) for dictionary lookups or programmatic interactions.
  • As a starting point for further fine-tuning on specialized tasks within the supported languages, or for adapter stacking.
  • Replacing meta-llama/Llama-3.1-8B-Instruct in pipelines needing improved low-resource language quality.