azherali/Riazi-8B

TEXT GENERATIONConcurrency Cost:1Model Size:8BQuant:FP8Ctx Length:32kTool Calling:SupportedPublished:Jun 9, 2026License:apache-2.0Architecture:Transformer Open Weights Cold

azherali/Riazi-8B is an 8 billion parameter language model developed by azherali, fine-tuned using Supervised Fine-Tuning (SFT). It is designed for general language tasks, demonstrated by its ability to process and respond to prompts in languages like Urdu. The model supports a context length of up to 32768 tokens and can be efficiently run with 4-bit or 8-bit quantization for reduced memory usage.

Loading preview...

Model Overview

azherali/Riazi-8B is an 8 billion parameter language model developed by azherali, fine-tuned using Supervised Fine-Tuning (SFT). It is built to handle a variety of language tasks, as showcased by its ability to process and generate responses for prompts in languages such as Urdu. The model supports a flexible context length, with a default configuration of 2048 tokens, and can be extended up to 32768 tokens through internal RoPE Scaling.

Key Capabilities

  • Multilingual Processing: Demonstrated ability to understand and generate text in languages like Urdu.
  • Efficient Inference: Supports 4-bit and 8-bit quantization for reduced memory footprint, making it suitable for environments with limited resources.
  • Optimized Performance: Leverages unsloth's FastLanguageModel for native 2x faster inference, enhancing generation speed.
  • Flexible Context Window: Internally supports RoPE Scaling, allowing for context lengths up to 32768 tokens.

Training Details

The model was trained using Supervised Fine-Tuning (SFT). The training procedure utilized several popular frameworks:

  • TRL: 0.22.2
  • Transformers: 4.56.2
  • Pytorch: 2.12.0+rocm7.2
  • Datasets: 4.3.0
  • Tokenizers: 0.22.2

Good For

  • Applications requiring multilingual text generation and understanding, particularly for languages like Urdu.
  • Deployment in resource-constrained environments due to support for 4-bit and 8-bit quantization.
  • Tasks benefiting from fast inference speeds provided by unsloth optimizations.