Jackrong/DASD-4B-Thinking-2507-GRPO-v2

Hugging Face
TEXT GENERATIONConcurrency Cost:1Model Size:4BQuant:BF16Ctx Length:32kPublished:Feb 10, 2026License:apache-2.0Architecture:Transformer Open Weights Warm

Jackrong/DASD-4B-Thinking-2507-GRPO-v2 is a 4 billion parameter Qwen3-based causal language model developed by Jackrong. This model was fine-tuned using Unsloth and Huggingface's TRL library, enabling faster training. It is designed for general language tasks, leveraging its Qwen3 architecture for efficient processing.

Loading preview...

Overview

Jackrong/DASD-4B-Thinking-2507-GRPO-v2 is a 4 billion parameter language model based on the Qwen3 architecture. Developed by Jackrong, this model was fine-tuned from unsloth/Qwen3-4B-Thinking-2507.

Key Characteristics

  • Architecture: Qwen3-based, a robust foundation for various NLP tasks.
  • Training Efficiency: Fine-tuned using Unsloth and Huggingface's TRL library, which facilitated a 2x faster training process.
  • Parameters: Features 4 billion parameters, balancing performance with computational efficiency.
  • Context Length: Supports a context window of 40960 tokens, allowing for processing longer inputs.

Use Cases

This model is suitable for applications requiring a capable language model with efficient training origins. Its Qwen3 base and 4B parameters make it a strong candidate for tasks such as text generation, summarization, and question answering, particularly where faster fine-tuning is an advantage.