alwaysgood/qwen3-st1

TEXT GENERATIONConcurrency Cost:1Model Size:4BQuant:BF16Ctx Length:32kPublished:Apr 21, 2026Architecture:Transformer Cold

The alwaysgood/qwen3-st1 is a 4 billion parameter language model, fine-tuned from unsloth/Qwen3-4B-Base using the TRL framework. This model has a context length of 32768 tokens and is trained with Supervised Fine-Tuning (SFT). It is designed for general text generation tasks, leveraging its base architecture for broad applicability.

Loading preview...

Model Overview

The alwaysgood/qwen3-st1 is a 4 billion parameter language model, derived from the unsloth/Qwen3-4B-Base architecture. It has been fine-tuned using the Transformer Reinforcement Learning (TRL) library, specifically employing Supervised Fine-Tuning (SFT) methods.

Key Characteristics

  • Base Model: Fine-tuned from unsloth/Qwen3-4B-Base.
  • Parameter Count: 4 billion parameters.
  • Context Length: Supports a substantial context window of 32768 tokens.
  • Training Method: Utilizes Supervised Fine-Tuning (SFT) for its training procedure.
  • Frameworks: Developed with TRL (version 0.24.0), Transformers (version 5.5.4), PyTorch (version 2.9.0+cu128), Datasets (version 4.3.0), and Tokenizers (version 0.22.2).

Potential Use Cases

This model is suitable for a variety of text generation tasks where a 4 billion parameter model with a large context window can be beneficial. Its SFT training suggests it can follow instructions and generate coherent text based on provided prompts. Developers can integrate it using the Hugging Face transformers pipeline for applications requiring text completion or conversational responses.