Name: huseyinatahaninan/appworld_distillation_sft-SFT-Qwen3-4B-Instruct-2507 API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: huseyinatahaninan

Model Overview

This model, appworld_distillation_sft-SFT-Qwen3-4B-Instruct-2507, is a 4 billion parameter instruction-tuned variant of the Qwen3 architecture, developed by huseyinatahaninan. It is a supervised fine-tuned (SFT) version of the base model Qwen/Qwen3-4B-Instruct-2507, specifically trained on the appworld_distillation_sft dataset.

Training Details

The model underwent 10 epochs of training with a learning rate of 5e-06 and a total batch size of 32 across 8 GPUs. The training process utilized the adamw_torch optimizer and a cosine learning rate scheduler with a 0.1 warmup ratio. Over the training period, the validation loss steadily decreased, achieving a final reported loss of 0.2588.

Key Characteristics

Base Model: Qwen3-4B-Instruct-2507
Parameter Count: 4 billion
Context Length: 40960 tokens
Fine-tuning Dataset: appworld_distillation_sft
Achieved Loss: 0.2588 on the evaluation set

Potential Use Cases

Given its fine-tuning on the appworld_distillation_sft dataset, this model is likely optimized for tasks and applications related to the specific domain or data characteristics of that dataset. Developers should consider its specialized training for use cases requiring performance within that particular data distribution.

Overview

Model Overview

Training Details

Key Characteristics

Potential Use Cases

Full Model Card (README)