alwaysgood/qwen3-st2

TEXT GENERATIONConcurrency Cost:1Model Size:4BQuant:BF16Ctx Length:32kPublished:Apr 22, 2026Architecture:Transformer Cold

The alwaysgood/qwen3-st2 model is a 4 billion parameter, instruction-tuned causal language model, fine-tuned from alwaysgood/qwen3-st1. Developed by alwaysgood, this model leverages the Qwen3 architecture and has a context length of 32768 tokens. It is specifically trained using Supervised Fine-Tuning (SFT) with the TRL framework, making it suitable for general text generation tasks based on user prompts.

Loading preview...

Model Overview

The alwaysgood/qwen3-st2 is a 4 billion parameter language model, representing a fine-tuned iteration of the alwaysgood/qwen3-st1 base model. It is built upon the Qwen3 architecture and supports a substantial context length of 32768 tokens, enabling it to process and generate longer sequences of text.

Training Details

This model was developed by alwaysgood and underwent Supervised Fine-Tuning (SFT) using the TRL (Transformer Reinforcement Learning) library. The training process utilized specific versions of key frameworks, including TRL 0.24.0, Transformers 5.5.4, PyTorch 2.9.0+cu128, Datasets 4.3.0, and Tokenizers 0.22.2. The training run can be visualized via Weights & Biases.

Key Capabilities

  • Instruction Following: Designed to generate text based on user-provided instructions or prompts.
  • Text Generation: Capable of producing coherent and contextually relevant text for various applications.
  • Extended Context: Benefits from a 32K token context window, allowing for more detailed and lengthy interactions.

Good For

  • General-purpose text generation tasks.
  • Applications requiring responses to specific user queries or instructions.
  • Scenarios where a fine-tuned Qwen3-based model with a large context window is beneficial.