smjain/qwen25-coder-bash-agent-grpo
TEXT GENERATIONConcurrency Cost:1Model Size:0.5BQuant:BF16Ctx Length:32kPublished:Mar 26, 2026Architecture:Transformer Warm

The smjain/qwen25-coder-bash-agent-grpo model is a 0.5 billion parameter instruction-tuned language model, fine-tuned from Qwen/Qwen2.5-Coder-0.5B-Instruct. It leverages the GRPO (Guided Reinforcement Policy Optimization) method, originally introduced for mathematical reasoning, to enhance its capabilities. This model is specifically optimized for agentic tasks, particularly those involving code and bash interactions, building upon its Qwen2.5-Coder base.

Loading preview...