JetBrains-Research/sft-router-qwen3-4b-swe-bench
The JetBrains-Research/sft-router-qwen3-4b-swe-bench model is a 4 billion parameter language model, fine-tuned from Qwen/Qwen3-4B. It is optimized for specific tasks, achieving a loss of 0.0374 and an accuracy of 0.9826 on its evaluation set. This model is designed for applications requiring high accuracy on specialized datasets, leveraging its fine-tuned Qwen3-4B architecture.
Loading preview...
Model Overview
This model, sft-router-qwen3-4b-swe-bench, is a specialized fine-tuned version of the Qwen/Qwen3-4B base model, developed by JetBrains-Research. It features 4 billion parameters and was trained on the sft_router_train dataset.
Performance Highlights
During its evaluation, the model demonstrated strong performance:
- Loss: 0.0374
- Accuracy: 0.9826
Training Details
The model was trained using specific hyperparameters to achieve its performance:
- Learning Rate: 5e-06
- Batch Size: 4 (train), 4 (eval)
- Gradient Accumulation Steps: 2
- Optimizer: ADAMW_TORCH
- LR Scheduler: Cosine with 100 warmup steps
- Epochs: 2.0
This fine-tuning process aims to adapt the Qwen3-4B architecture for particular tasks, as indicated by its high accuracy on the evaluation set. Further details on intended uses, limitations, and specific training/evaluation data are not provided in the current model card.