JetBrains-Research/sft-router-qwen3-4b-swe-bench

TEXT GENERATIONConcurrency Cost:1Model Size:4BQuant:BF16Ctx Length:32kPublished:Mar 19, 2026License:otherArchitecture:Transformer Cold

The JetBrains-Research/sft-router-qwen3-4b-swe-bench model is a 4 billion parameter language model, fine-tuned from Qwen/Qwen3-4B. It is optimized for specific tasks, achieving a loss of 0.0374 and an accuracy of 0.9826 on its evaluation set. This model is designed for applications requiring high accuracy on specialized datasets, leveraging its fine-tuned Qwen3-4B architecture.

Loading preview...

Model Overview

This model, sft-router-qwen3-4b-swe-bench, is a specialized fine-tuned version of the Qwen/Qwen3-4B base model, developed by JetBrains-Research. It features 4 billion parameters and was trained on the sft_router_train dataset.

Performance Highlights

During its evaluation, the model demonstrated strong performance:

  • Loss: 0.0374
  • Accuracy: 0.9826

Training Details

The model was trained using specific hyperparameters to achieve its performance:

  • Learning Rate: 5e-06
  • Batch Size: 4 (train), 4 (eval)
  • Gradient Accumulation Steps: 2
  • Optimizer: ADAMW_TORCH
  • LR Scheduler: Cosine with 100 warmup steps
  • Epochs: 2.0

This fine-tuning process aims to adapt the Qwen3-4B architecture for particular tasks, as indicated by its high accuracy on the evaluation set. Further details on intended uses, limitations, and specific training/evaluation data are not provided in the current model card.