bigatuna/Qwen3-0.6B-Sushi-Coder

Hugging Face
TEXT GENERATIONConcurrency Cost:1Model Size:0.8BQuant:BF16Ctx Length:32kPublished:Dec 31, 2025License:apache-2.0Architecture:Transformer0.0K Open Weights Warm

The bigatuna/Qwen3-0.6B-Sushi-Coder is a 0.8 billion parameter causal language model, fine-tuned from Qwen3-0.6B, specifically optimized for Python code generation. It leverages a two-stage training process including GRPO and Supervised Fine-Tuning on code datasets. This model demonstrates enhanced performance in Python code generation, achieving a 29.3% pass@1 on HumanEval, significantly outperforming its base model. It is best suited for applications requiring efficient and accurate Python code generation within its 40960 token context length.

Loading preview...

Qwen3-0.6B-Sushi-Coder: Python Code Generation Model

This model, developed by bigatuna, is a 0.8 billion parameter language model derived from Qwen3-0.6B, specifically fine-tuned for generating Python code. It features a substantial 40960 token context length, making it suitable for handling moderately sized code snippets and related prompts.

Key Capabilities & Training

  • Optimized for Python Code Generation: The model's primary strength lies in its ability to generate Python code, achieved through a specialized two-stage training process.
  • Advanced Training Methodology: Training involved GRPO (Goal-Restricted Policy Optimization) using TRL with a reward model based on test execution and formatting, followed by Supervised Fine-Tuning (SFT) on datasets like microsoft/rStar-Coder and open-r1/codeforces-cots.
  • Improved Performance: It achieves a 29.3% pass@1 on the HumanEval benchmark, marking a significant improvement over the base Qwen/Qwen3-0.6B model's 20.1%.

Use Cases & Limitations

  • Good for: Rapid prototyping, code completion, and generating Python functions for specific tasks.
  • Limitations: While highly effective for Python, its performance may be reduced for other programming languages. Due to its compact size, it may struggle with highly complex reasoning tasks or generate plausible but incorrect code for intricate edge cases. Users are advised to avoid greedy decoding (temperature=0) to prevent repetition issues.