laion/sft_GLM-4-7-swesmith-sandboxes-with_tests-oracle_verified_120s-maxeps-131k_Qwen3-32B

Hugging Face
TEXT GENERATIONConcurrency Cost:2Model Size:32BQuant:FP8Ctx Length:32kPublished:Feb 26, 2026License:otherArchitecture:Transformer Warm

This model is a 32 billion parameter fine-tuned version of Qwen3-32B, developed by laion. It specializes in processing data from the GLM-4.7-swesmith-sandboxes-with_tests-oracle_verified_120s-maxeps-131k dataset, indicating a focus on specific task-oriented performance. With a 32768 token context length, it is designed for applications requiring extensive contextual understanding and generation based on its specialized training data.

Loading preview...

Model Overview

This model, sft_GLM-4-7-swesmith-sandboxes-with_tests-oracle_verified_120s-maxeps-131k_Qwen3-32B, is a fine-tuned variant of the Qwen/Qwen3-32B architecture. It has been specifically adapted using a unique dataset derived from GLM-4.7-swesmith-sandboxes-with_tests-oracle_verified_120s-maxeps-131k.

Key Training Details

The model underwent training with the following hyperparameters:

  • Learning Rate: 4e-05
  • Batch Size: 1 (train), 8 (eval)
  • Gradient Accumulation: 2 steps, leading to a total effective batch size of 32
  • Optimizer: ADAMW_TORCH_FUSED
  • LR Scheduler: Cosine with 0.1 warmup ratio
  • Epochs: 7.0

Technical Stack

The training leveraged:

  • Transformers: 4.57.6
  • Pytorch: 2.9.0+cu128
  • Datasets: 4.4.1
  • Tokenizers: 0.22.2

Intended Use

While specific use cases are not detailed in the provided information, its fine-tuning on a specialized dataset suggests potential applications in areas related to the nature of the GLM-4.7-swesmith-sandboxes-with_tests-oracle_verified_120s-maxeps-131k data, likely involving complex reasoning or specific domain knowledge.