russwest404/Qwen3-4B-ReTool-SFT

Hugging Face
TEXT GENERATIONConcurrency Cost:1Model Size:4BQuant:BF16Ctx Length:32kPublished:May 2, 2025License:otherArchitecture:Transformer Warm

The russwest404/Qwen3-4B-ReTool-SFT model is a fine-tuned version of the Qwen/Qwen3-4B architecture, specifically optimized using the retool dataset. This model focuses on specialized tasks related to the retool dataset, achieving a training loss of 0.3798. It is intended for applications requiring capabilities derived from its fine-tuning on this specific dataset, offering a tailored solution for relevant use cases.

Loading preview...

Model Overview

The russwest404/Qwen3-4B-ReTool-SFT is a specialized language model derived from the Qwen3-4B base architecture. It has undergone supervised fine-tuning (SFT) using the retool dataset, indicating an optimization for tasks and data patterns present within this specific dataset.

Key Characteristics

  • Base Model: Qwen/Qwen3-4B, a 4 billion parameter model from the Qwen family.
  • Fine-tuning: Specifically fine-tuned on the retool dataset.
  • Performance Metric: Achieved a validation loss of 0.3798 on the evaluation set, with training losses consistently decreasing during the fine-tuning process.

Training Details

The model was trained with a learning rate of 1e-05, a train_batch_size of 2, and gradient_accumulation_steps of 4, resulting in a total_train_batch_size of 64. The training utilized 8 devices and ran for 2 epochs with a cosine learning rate scheduler and a warmup ratio of 0.1. The optimizer used was adamw_torch.

Intended Use Cases

While specific intended uses and limitations require more detailed information, the fine-tuning on the retool dataset suggests its primary utility lies in applications that align with the characteristics and content of this dataset. Developers should consider this model for tasks where its specialized training on retool data would provide an advantage over general-purpose models.