huseyinatahaninan/appworld_distillation_sft-SFT-Qwen3-4B-Instruct-2507
TEXT GENERATIONConcurrency Cost:1Model Size:4BQuant:BF16Ctx Length:32kPublished:Dec 5, 2025License:apache-2.0Architecture:Transformer Open Weights Warm

This model is a 4 billion parameter Qwen3-Instruct variant, fine-tuned by huseyinatahaninan, with a 40960 token context length. It is specifically adapted from Qwen/Qwen3-4B-Instruct-2507 through supervised fine-tuning on the appworld_distillation_sft dataset. The model demonstrates a final validation loss of 0.2588, indicating its specialization for tasks related to the distillation dataset.

Loading preview...