farffadet/syllogym-judge-qwen3-4b-grpo-v2

Hugging Face
TEXT GENERATIONConcurrency Cost:1Model Size:4BQuant:BF16Ctx Length:32kPublished:Mar 22, 2026License:apache-2.0Architecture:Transformer Open Weights Warm

The farffadet/syllogym-judge-qwen3-4b-grpo-v2 is a 4 billion parameter Qwen3 model developed by farffadet, fine-tuned from unsloth/Qwen3-4B-unsloth-bnb-4bit. This model was trained using Unsloth and Huggingface's TRL library, enabling faster training. It is designed for general language tasks, leveraging its Qwen3 architecture and 32768 token context length.

Loading preview...

Overview

The farffadet/syllogym-judge-qwen3-4b-grpo-v2 is a 4 billion parameter language model based on the Qwen3 architecture. Developed by farffadet, this model was fine-tuned from the unsloth/Qwen3-4B-unsloth-bnb-4bit base model.

Key Characteristics

  • Base Model: Qwen3-4B
  • Parameter Count: 4 billion parameters
  • Context Length: 32768 tokens
  • Training Method: Utilizes Unsloth and Huggingface's TRL library for accelerated fine-tuning, resulting in 2x faster training compared to standard methods.
  • License: Apache-2.0

Intended Use

This model is suitable for a variety of general language understanding and generation tasks, benefiting from its Qwen3 foundation and efficient fine-tuning process. Its 32K context window allows for processing longer inputs and generating more coherent, extended outputs.