Name: huseyinatahaninan/appworld_distillation_sft_v2-SFT-Qwen3-4B-Instruct-2507 API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: huseyinatahaninan

Model Overview

This model, huseyinatahaninan/appworld_distillation_sft_v2-SFT-Qwen3-4B-Instruct-2507, is a 4 billion parameter instruction-tuned language model. It is a supervised fine-tuned (SFT) version of the base model Qwen/Qwen3-4B-Instruct-2507, specifically adapted using the appworld_distillation_sft_v2 dataset.

Key Characteristics

Base Model: Qwen3-4B-Instruct-2507
Parameter Count: 4 billion
Context Length: 40960 tokens
Fine-tuning Dataset: appworld_distillation_sft_v2
Training Epochs: 30
Final Validation Loss: 0.7486

Training Details

The model was trained using a learning rate of 5e-06, a train_batch_size of 4, and an eval_batch_size of 1 across 8 GPUs, resulting in a total_train_batch_size of 32. The optimizer used was adamw_torch with a cosine learning rate scheduler and a warmup ratio of 0.1. The training process involved 60 steps over 30 epochs, showing a consistent reduction in training loss and a final validation loss of 0.7486.

Good for

Applications requiring a compact, instruction-tuned model based on the Qwen3-4B architecture.
Tasks that benefit from specific knowledge or patterns learned from the appworld_distillation_sft_v2 dataset.
Scenarios where a balance between model size and performance on fine-tuned tasks is critical.

Overview

Model Overview

Key Characteristics

Training Details

Good for

Full Model Card (README)