Name: bigatuna/Qwen3-1.7B-Sushi-Coder API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: bigatuna

Model Overview

The bigatuna/Qwen3-1.7B-Sushi-Coder is a 2 billion parameter model built upon the Qwen3-1.7B base architecture. It has been specifically fine-tuned using Supervised Fine-Tuning (SFT) with LoRA to enhance its capabilities in code generation, particularly for competitive programming scenarios.

Key Capabilities

Optimized Code Generation: Designed to produce high-quality code, making it suitable for programming challenges and development tasks.
Competitive Programming Focus: Fine-tuned on datasets like ericholam/codeforces-sft-dataset-beta (1408 examples) to excel in competitive programming problem-solving.
Enhanced Reasoning: Incorporates high-quality reasoning examples from TeichAI/claude-4.5-opus-high-reasoning-250x to improve logical problem-solving.
Efficient Training: Utilized LoRA (r=8, alpha=16) on attention and MLP layers, Liger Kernel for memory efficiency, and FlashAttention-2 with packing during its 1000-step training process.
Context Length: Supports a context window of 2048 tokens.

Recommended Usage

This model is ideal for developers and competitive programmers seeking assistance with code generation, especially for problems requiring logical reasoning and efficient solutions. For optimal results, it is recommended to use specific sampling parameters: a temperature between 0.6-0.7, top_p at 0.95, and top_k at 20, avoiding greedy decoding.

Overview

Model Overview

Key Capabilities

Recommended Usage

Full Model Card (README)