DCAgent/g1_timeout_e1_gpt_long_sampled_swesmith_psu_thinking_tacc-Qwen3-32B
DCAgent/g1_timeout_e1_gpt_long_sampled_swesmith_psu_thinking_tacc-Qwen3-32B is a 32 billion parameter language model fine-tuned from Qwen/Qwen3-32B. This model has been specifically adapted using the /scratch/08134/negin/hub/datasets--DCAgent--g1_timeout_e1_gpt_long_sampled_swesmith_psu_d1_original_40k_glm47_traces_thinking_preprocessed dataset. It is intended for applications requiring specialized performance based on its unique training data, leveraging a 32768 token context length.
Loading preview...
Model Overview
This model, g1_timeout_e1_gpt_long_sampled_swesmith_psu_thinking_tacc-Qwen3-32B, is a fine-tuned variant of the Qwen3-32B base model developed by Qwen. It has been specialized through training on a unique dataset: /scratch/08134/negin/hub/datasets--DCAgent--g1_timeout_e1_gpt_long_sampled_swesmith_psu_d1_original_40k_glm47_traces_thinking_preprocessed.
Key Training Details
The fine-tuning process involved specific hyperparameters:
- Learning Rate: 4e-05
- Optimizer: ADAMW_TORCH_FUSED with betas=(0.9, 0.999) and epsilon=1e-08
- Batch Size: 1 (train), 8 (eval) with a total effective batch size of 32 across 32 devices
- Epochs: 7.0
- Scheduler: Cosine learning rate scheduler with a 0.1 warmup ratio
This specialized training indicates an optimization for tasks related to the specific characteristics of its training data, making it suitable for use cases aligned with the dataset's domain. The model leverages a 32 billion parameter architecture and supports a context length of 32768 tokens.