takami2022/qwen3-4b-sft-merged-v2v5ver1
Hugging Face
TEXT GENERATIONConcurrency Cost:1Model Size:4BQuant:BF16Ctx Length:32kPublished:Mar 1, 2026License:apache-2.0Architecture:Transformer Open Weights Warm

The takami2022/qwen3-4b-sft-merged-v2v5ver1 is a 4 billion parameter Qwen3-based instruction-tuned causal language model. Developed by takami2022, it was fine-tuned using QLoRA (4-bit) and subsequently merged into a 16-bit model, making it fully self-contained without requiring adapter loading. This model is specifically designed as a strong base for further DPO (Direct Preference Optimization) training, leveraging a structured data merged dataset.

Loading preview...

Model Overview

This model, takami2022/qwen3-4b-sft-merged-v2v5ver1, is a 4 billion parameter instruction-tuned language model based on the Qwen3 architecture. It was developed by takami2022 through a fine-tuning process using QLoRA (4-bit) with Unsloth, and the resulting LoRA adapter was then merged into the base model weights to create a fully self-contained 16-bit model.

Key Characteristics

  • Base Model: Utilizes Qwen/Qwen3-4B-Instruct-2507 as its foundation.
  • Training Method: Fine-tuned with QLoRA (4-bit) and subsequently merged to 16-bit, eliminating the need for external adapter loading.
  • Dataset: Trained on takami2022/structured_data_merged_v2v5_0222.
  • Training Configuration: Employed a maximum sequence length of 1024, 3 epochs, a learning rate of 1e-06, and LoRA parameters of r=64, alpha=128, with CoT (Chain-of-Thought) masking enabled.

Intended Use

This model is primarily intended as a strong starting point for subsequent DPO (Direct Preference Optimization) training. Its self-contained nature simplifies deployment for further fine-tuning stages.