osieosie/tmax-qwen3-4b-sft-20260316-100k-asst-loss
TEXT GENERATIONConcurrency Cost:1Model Size:4BQuant:BF16Ctx Length:32kPublished:Mar 16, 2026Architecture:Transformer Warm

The osieosie/tmax-qwen3-4b-sft-20260316-100k-asst-loss model is a 4 billion parameter, instruction-tuned variant of the Qwen3 architecture, fine-tuned using Supervised Fine-Tuning (SFT) with TRL. This model is designed for general text generation tasks, leveraging its 32768 token context length to process longer inputs. Its training focuses on assistant-like conversational capabilities, making it suitable for interactive applications requiring coherent and contextually relevant responses.

Loading preview...