y-ohtani/GRPO-TCR-Qwen3-4B-test
TEXT GENERATIONConcurrency Cost:1Model Size:4BQuant:BF16Ctx Length:32kPublished:Feb 28, 2026License:apache-2.0Architecture:Transformer Open Weights Warm

y-ohtani/GRPO-TCR-Qwen3-4B-test is a 4 billion parameter Qwen3-based model developed by y-ohtani, fine-tuned using Group Relative Policy Optimization with Tool Call Reward (GRPO-TCR). This model is specifically designed for deliberative agentic reasoning, enabling selective use of a code_interpreter tool to solve math and coding problems across multiple turns. It emphasizes concise responses and rewards correct answers and tool usage attempts, aiming to prevent verbose self-reasoning.

Loading preview...