laion/GLM-4.6-stackexchange-overflow-sandboxes-32eps-65k-reasoning_num-train-epochs_6.0_Qwen3-32B

TEXT GENERATIONConcurrency Cost:2Model Size:32BQuant:FP8Ctx Length:32kPublished:Jan 9, 2026Architecture:Transformer Cold

The laion/GLM-4.6-stackexchange-overflow-sandboxes-32eps-65k-reasoning_num-train-epochs_6.0_Qwen3-32B model is a 32 billion parameter language model based on the Qwen3 architecture, trained with a context length of 32768 tokens. It was fine-tuned over 6 epochs using a cosine learning rate schedule and AdamW optimizer. This model is designed for general language understanding and generation tasks, with its specific training data and primary differentiators requiring further information.

Loading preview...

Model Overview

This model, laion/GLM-4.6-stackexchange-overflow-sandboxes-32eps-65k-reasoning_num-train-epochs_6.0_Qwen3-32B, is a 32 billion parameter language model built upon the Qwen3 architecture. It was trained with a substantial context length of 32768 tokens, indicating its potential for handling extensive textual inputs.

Training Details

The model underwent training for 6.0 epochs using a distributed setup across 16 GPUs. Key hyperparameters included a learning rate of 4e-05, a total training batch size of 64 (achieved with a train_batch_size of 1 and gradient_accumulation_steps of 4), and the AdamW_TORCH_FUSED optimizer. A cosine learning rate scheduler with a 0.1 warmup ratio was employed to manage the learning rate throughout the training process.

Key Characteristics

  • Architecture: Qwen3-32B
  • Parameter Count: 32 billion
  • Context Length: 32768 tokens
  • Training Epochs: 6.0
  • Optimizer: AdamW_TORCH_FUSED

Good for

  • General language understanding tasks requiring a large context window.
  • Applications benefiting from a 32B parameter model for text generation and analysis.

Further details regarding the specific training dataset, intended uses, and performance benchmarks are not provided in the current model card.