Abhinav-hf/qwen-grpo-sft-trained-16bit

TEXT GENERATIONConcurrency Cost:1Model Size:3.1BQuant:BF16Ctx Length:32kPublished:Apr 25, 2026License:apache-2.0Architecture:Transformer Open Weights Cold

Abhinav-hf/qwen-grpo-sft-trained-16bit is a 3.1 billion parameter Qwen2.5-based causal language model developed by Abhinav-hf. This model was fine-tuned from unsloth/Qwen2.5-3B-Instruct using Unsloth and Huggingface's TRL library, enabling 2x faster training. It is designed for general instruction-following tasks, leveraging its efficient training methodology.

Loading preview...

Model Overview

Abhinav-hf/qwen-grpo-sft-trained-16bit is a 3.1 billion parameter language model developed by Abhinav-hf. It is a fine-tuned variant of the unsloth/Qwen2.5-3B-Instruct model, leveraging the Qwen2.5 architecture.

Key Characteristics

  • Efficient Training: This model was fine-tuned using Unsloth and Huggingface's TRL library, which facilitated a 2x faster training process compared to standard methods.
  • Base Model: Built upon the robust Qwen2.5-3B-Instruct foundation, inheriting its general instruction-following capabilities.
  • Parameter Count: Features 3.1 billion parameters, offering a balance between performance and computational efficiency.
  • Context Length: Supports a context length of 32768 tokens, allowing for processing longer inputs and generating more coherent responses.

Use Cases

This model is suitable for a variety of general-purpose instruction-following tasks where efficient performance from a 3B parameter model is desired. Its optimized training process suggests potential for applications requiring rapid iteration or deployment.