modrill/qwen3-4b-nothink-baseline-lora-sft

TEXT GENERATIONConcurrency Cost:1Model Size:4BQuant:BF16Ctx Length:32kTool Calling:SupportedPublished:Jun 7, 2026License:apache-2.0Architecture:Transformer Open Weights Cold

The modrill/qwen3-4b-nothink-baseline-lora-sft model is a 4 billion parameter language model based on Qwen/Qwen3-4B-Base, fine-tuned using LoRA for direct inference. This model is specifically configured for a "no-think" mode, making it suitable for applications requiring direct, non-reasoning responses. It is designed for text generation tasks, particularly those where a streamlined, immediate output is preferred over complex reasoning processes.

Loading preview...

Qwen3-4B Code SFT - No-Think Baseline

This model, modrill/qwen3-4b-nothink-baseline-lora-sft, is a 4 billion parameter language model built upon the Qwen/Qwen3-4B-Base architecture. It has undergone supervised fine-tuning (SFT) using LoRA (rank 64, alpha 128), with the adapters merged directly into the full model weights for seamless inference without requiring separate adapter loading.

Key Characteristics

  • Base Model: Qwen/Qwen3-4B-Base.
  • Fine-tuning Method: LoRA-based SFT, not full-parameter fine-tuning.
  • "No-Think" Mode: Configured with enable_thinking=false, indicating it's optimized for direct response generation rather than multi-step reasoning.
  • Training Cutoff Length: Trained with a maximum sequence length of 8192 tokens.
  • Context Length: Supports a context length of up to 32768 tokens.

Usage Considerations

This model is particularly suited for applications where a straightforward, immediate output is desired, bypassing internal "thinking" processes. Users should set enable_thinking=false in their chat template during inference. The recommended max_tokens for generation is 8192. The model is licensed under Apache 2.0, consistent with its base model.