minsu0567/Uni-IAD-R2-Qwen3.5-si

VISIONConcurrency Cost:1Model Size:4.5BQuant:BF16Ctx Length:32kTool Calling:SupportedPublished:Jun 9, 2026License:otherArchitecture:Transformer Cold

The minsu0567/Uni-IAD-R2-Qwen3.5-si is a 4.5 billion parameter language model, fine-tuned from unsloth/Qwen3.5-4B. This model was specifically trained on the PA_SFT_2_reordered_si dataset, indicating a specialization for tasks related to that dataset's characteristics. It features a substantial context length of 32768 tokens, making it suitable for processing longer sequences of text. The fine-tuning process utilized a learning rate of 1e-05 and a cosine learning rate scheduler over 1 epoch.

Loading preview...

Model Overview

The minsu0567/Uni-IAD-R2-Qwen3.5-si is a 4.5 billion parameter language model, derived from the unsloth/Qwen3.5-4B base model. It has been specifically fine-tuned using the PA_SFT_2_reordered_si dataset, suggesting a focus on tasks or data distributions represented within this particular dataset. The model supports a significant context window of 32768 tokens, enabling it to handle extensive input sequences.

Training Details

The fine-tuning process involved the following key hyperparameters:

  • Learning Rate: 1e-05
  • Optimizer: ADAMW_BNB
  • Batch Size: 1 (train), 8 (eval)
  • Gradient Accumulation Steps: 2
  • Epochs: 1.0
  • LR Scheduler: Cosine with 100 warmup steps

This configuration indicates a focused training approach on the specified dataset. The model was trained using Transformers 5.10.2, Pytorch 2.11.0+cu128, Datasets 5.0.0, and Tokenizers 0.22.2.

Potential Use Cases

Given its fine-tuning on the PA_SFT_2_reordered_si dataset, this model is likely best suited for:

  • Applications requiring understanding or generation of content similar to the training data.
  • Tasks benefiting from a 4.5B parameter model with a large 32K context window.
  • Further research or fine-tuning on related datasets.