Name: JetBrains-Research/sft-router-qwen3-4b-swe-bench API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: JetBrains-Research

Model Overview

This model, sft-router-qwen3-4b-swe-bench, is a specialized fine-tuned version of the Qwen/Qwen3-4B base model, developed by JetBrains-Research. It features 4 billion parameters and was trained on the sft_router_train dataset.

Performance Highlights

During its evaluation, the model demonstrated strong performance:

Loss: 0.0374
Accuracy: 0.9826

Training Details

The model was trained using specific hyperparameters to achieve its performance:

Learning Rate: 5e-06
Batch Size: 4 (train), 4 (eval)
Gradient Accumulation Steps: 2
Optimizer: ADAMW_TORCH
LR Scheduler: Cosine with 100 warmup steps
Epochs: 2.0

This fine-tuning process aims to adapt the Qwen3-4B architecture for particular tasks, as indicated by its high accuracy on the evaluation set. Further details on intended uses, limitations, and specific training/evaluation data are not provided in the current model card.

Overview

Model Overview

Performance Highlights

Training Details

Full Model Card (README)