Name: modrill/qwen3-4b-nothink-baseline-lora-sft API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: modrill

Qwen3-4B Code SFT - No-Think Baseline

This model, modrill/qwen3-4b-nothink-baseline-lora-sft, is a 4 billion parameter language model built upon the Qwen/Qwen3-4B-Base architecture. It has undergone supervised fine-tuning (SFT) using LoRA (rank 64, alpha 128), with the adapters merged directly into the full model weights for seamless inference without requiring separate adapter loading.

Key Characteristics

Base Model: Qwen/Qwen3-4B-Base.
Fine-tuning Method: LoRA-based SFT, not full-parameter fine-tuning.
"No-Think" Mode: Configured with enable_thinking=false, indicating it's optimized for direct response generation rather than multi-step reasoning.
Training Cutoff Length: Trained with a maximum sequence length of 8192 tokens.
Context Length: Supports a context length of up to 32768 tokens.

Usage Considerations

This model is particularly suited for applications where a straightforward, immediate output is desired, bypassing internal "thinking" processes. Users should set enable_thinking=false in their chat template during inference. The recommended max_tokens for generation is 8192. The model is licensed under Apache 2.0, consistent with its base model.

Overview

Qwen3-4B Code SFT - No-Think Baseline

Key Characteristics

Usage Considerations

Full Model Card (README)