Model Overview

This model, g1_min_episodes_e1_gpt_long_sampled_swesmith_psu_thinking_tacc-Qwen3-32B, is a specialized fine-tune of the Qwen3-32B base model. It leverages a 32 billion parameter architecture and a context length of 32768 tokens, making it suitable for processing extensive inputs.

Key Capabilities

Specialized Fine-tuning: The model has been fine-tuned on a unique dataset: /scratch/08134/negin/hub/datasets--DCAgent--g1_min_episodes_e1_gpt_long_sampled_swesmith_psu_d1_original_40k_glm47_traces/snapshots/857b3ce8060050ded9af40dc129460f566d0c635_thinking_preprocessed. This indicates a focus on tasks or data characteristics present within this specific training corpus.
Base Model Strength: Inherits the foundational capabilities of the Qwen3-32B model, which typically includes strong language understanding and generation.

Training Details

The fine-tuning process involved 7 epochs with a learning rate of 4e-05, utilizing a distributed setup across 32 GPUs. The optimizer used was ADAMW_TORCH_FUSED with specific beta and epsilon parameters, and a cosine learning rate scheduler with a 0.1 warmup ratio.

Good For

Research and Development: Ideal for researchers and developers working with or interested in the specific data distribution of the fine-tuning dataset.
Domain-Specific Applications: Potentially useful for applications that require understanding or generating text aligned with the 'thinking' processes or traces present in its training data.

Overview

Model Overview

Key Capabilities

Training Details

Good For

Full Model Card (README)