ozayezerceli/Qwen3-4B-Inst-CoT-GRPO

Hugging Face
TEXT GENERATIONConcurrency Cost:1Model Size:4BQuant:BF16Ctx Length:32kPublished:Dec 23, 2025License:apache-2.0Architecture:Transformer Open Weights Warm

The ozayezerceli/Qwen3-4B-Inst-CoT-GRPO is a 4 billion parameter Qwen3-based instruction-tuned causal language model developed by ozayezerceli, featuring a 40960 token context length. This model is a finetuned version of ozayezerceli/Qwen3-4B-Inst-CoTsft, optimized for performance through training with Unsloth and Huggingface's TRL library. It is designed for general instruction-following tasks, leveraging its Qwen3 architecture and efficient training methodology.

Loading preview...

Model Overview

The ozayezerceli/Qwen3-4B-Inst-CoT-GRPO is a 4 billion parameter instruction-tuned language model based on the Qwen3 architecture, developed by ozayezerceli. It boasts a substantial context length of 40960 tokens, making it suitable for processing longer inputs and generating coherent, extended responses.

Key Capabilities

  • Instruction Following: The model is specifically finetuned for instruction-following tasks, indicating its proficiency in understanding and executing user commands.
  • Efficient Training: It was trained using Unsloth and Huggingface's TRL library, which suggests an optimized and potentially faster training process compared to standard methods.
  • Qwen3 Foundation: Built upon the Qwen3 base model, it inherits the foundational capabilities and architectural strengths of the Qwen series.

Good For

  • General-purpose AI applications: Its instruction-tuned nature makes it versatile for various tasks requiring direct command execution.
  • Applications requiring long context: The 40960 token context window is beneficial for tasks involving extensive text analysis, summarization, or generation.
  • Developers seeking efficient models: The use of Unsloth for training implies a focus on performance and resource efficiency, which can be advantageous for deployment.