LlameUser/qwen-3-4b-thinking-r1-st
TEXT GENERATIONConcurrency Cost:1Model Size:4BQuant:BF16Ctx Length:32kArchitecture:Transformer0.0K Warm

LlameUser/qwen-3-4b-thinking-r1-st is a fine-tuned language model based on Qwen/Qwen3-4B-Thinking-2507, developed by LlameUser. This model was trained using the TRL framework and incorporates the GRPO method, which is designed to enhance mathematical reasoning capabilities. It is optimized for tasks requiring advanced logical and mathematical problem-solving, building upon the Qwen3-4B architecture.

Loading preview...