Name: bigatuna/Qwen3-0.6B-Sushi-Coder API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: bigatuna

Qwen3-0.6B-Sushi-Coder: Python Code Generation Model

This model, developed by bigatuna, is a 0.8 billion parameter language model derived from Qwen3-0.6B, specifically fine-tuned for generating Python code. It features a substantial 40960 token context length, making it suitable for handling moderately sized code snippets and related prompts.

Key Capabilities & Training

Optimized for Python Code Generation: The model's primary strength lies in its ability to generate Python code, achieved through a specialized two-stage training process.
Advanced Training Methodology: Training involved GRPO (Goal-Restricted Policy Optimization) using TRL with a reward model based on test execution and formatting, followed by Supervised Fine-Tuning (SFT) on datasets like microsoft/rStar-Coder and open-r1/codeforces-cots.
Improved Performance: It achieves a 29.3% pass@1 on the HumanEval benchmark, marking a significant improvement over the base Qwen/Qwen3-0.6B model's 20.1%.

Use Cases & Limitations

Good for: Rapid prototyping, code completion, and generating Python functions for specific tasks.
Limitations: While highly effective for Python, its performance may be reduced for other programming languages. Due to its compact size, it may struggle with highly complex reasoning tasks or generate plausible but incorrect code for intricate edge cases. Users are advised to avoid greedy decoding (temperature=0) to prevent repetition issues.

Overview

Qwen3-0.6B-Sushi-Coder: Python Code Generation Model

Key Capabilities & Training

Use Cases & Limitations

Full Model Card (README)