The laion/GLM-4.6-stackexchange-overflow-sandboxes-32eps-65k-reasoning_num-train-epochs_4.0_Qwen3-32B model is a 32 billion parameter language model based on the Qwen3 architecture, fine-tuned for 4.0 epochs. This model was trained from scratch with a 32768 token context length. Specific details regarding its primary differentiators, training dataset, and intended use cases are not provided in the available documentation.
Loading preview...
Model Overview
This model, laion/GLM-4.6-stackexchange-overflow-sandboxes-32eps-65k-reasoning_num-train-epochs_4.0_Qwen3-32B, is a 32 billion parameter language model built upon the Qwen3 architecture. It was trained from scratch over 4.0 epochs, utilizing a substantial context length of 32768 tokens.
Training Details
The training process involved specific hyperparameters:
- Learning Rate: 4e-05
- Batch Size: 1 (train), 8 (eval)
- Gradient Accumulation Steps: 4, leading to a total effective batch size of 64
- Optimizer: ADAMW_TORCH_FUSED with betas=(0.9, 0.999) and epsilon=1e-08
- LR Scheduler: Cosine type with a warmup ratio of 0.1
- Epochs: 4.0
The model was developed using Transformers 4.57.3, Pytorch 2.9.0+cu128, Datasets 4.4.1, and Tokenizers 0.22.1.
Capabilities and Use Cases
Due to limited available documentation, specific capabilities, primary differentiators, and intended use cases beyond its foundational architecture and training parameters are not detailed. Users should conduct further evaluation to determine its suitability for particular applications.