The bigatuna/Qwen3-1.7B-Sushi-Coder is a 2 billion parameter Qwen3-based causal language model developed by bigatuna. It is specifically fine-tuned for code generation, excelling in competitive programming tasks. This model leverages a 2048-token context length and is optimized for generating high-quality code solutions. Its primary strength lies in solving complex coding challenges and assisting with programming tasks.
Loading preview...
Model Overview
The bigatuna/Qwen3-1.7B-Sushi-Coder is a 2 billion parameter model built upon the Qwen3-1.7B base architecture. It has been specifically fine-tuned using Supervised Fine-Tuning (SFT) with LoRA to enhance its capabilities in code generation, particularly for competitive programming scenarios.
Key Capabilities
- Optimized Code Generation: Designed to produce high-quality code, making it suitable for programming challenges and development tasks.
- Competitive Programming Focus: Fine-tuned on datasets like
ericholam/codeforces-sft-dataset-beta(1408 examples) to excel in competitive programming problem-solving. - Enhanced Reasoning: Incorporates high-quality reasoning examples from
TeichAI/claude-4.5-opus-high-reasoning-250xto improve logical problem-solving. - Efficient Training: Utilized LoRA (r=8, alpha=16) on attention and MLP layers, Liger Kernel for memory efficiency, and FlashAttention-2 with packing during its 1000-step training process.
- Context Length: Supports a context window of 2048 tokens.
Recommended Usage
This model is ideal for developers and competitive programmers seeking assistance with code generation, especially for problems requiring logical reasoning and efficient solutions. For optimal results, it is recommended to use specific sampling parameters: a temperature between 0.6-0.7, top_p at 0.95, and top_k at 20, avoiding greedy decoding.