modrill/qwen3-4b-nothink-baseline-lora-sft
The modrill/qwen3-4b-nothink-baseline-lora-sft model is a 4 billion parameter language model based on Qwen/Qwen3-4B-Base, fine-tuned using LoRA for direct inference. This model is specifically configured for a "no-think" mode, making it suitable for applications requiring direct, non-reasoning responses. It is designed for text generation tasks, particularly those where a streamlined, immediate output is preferred over complex reasoning processes.
Loading preview...
Qwen3-4B Code SFT - No-Think Baseline
This model, modrill/qwen3-4b-nothink-baseline-lora-sft, is a 4 billion parameter language model built upon the Qwen/Qwen3-4B-Base architecture. It has undergone supervised fine-tuning (SFT) using LoRA (rank 64, alpha 128), with the adapters merged directly into the full model weights for seamless inference without requiring separate adapter loading.
Key Characteristics
- Base Model: Qwen/Qwen3-4B-Base.
- Fine-tuning Method: LoRA-based SFT, not full-parameter fine-tuning.
- "No-Think" Mode: Configured with
enable_thinking=false, indicating it's optimized for direct response generation rather than multi-step reasoning. - Training Cutoff Length: Trained with a maximum sequence length of 8192 tokens.
- Context Length: Supports a context length of up to 32768 tokens.
Usage Considerations
This model is particularly suited for applications where a straightforward, immediate output is desired, bypassing internal "thinking" processes. Users should set enable_thinking=false in their chat template during inference. The recommended max_tokens for generation is 8192. The model is licensed under Apache 2.0, consistent with its base model.