laion/Qwen3-8B_exp_tas_temp_0.25_traces_save-strategy_steps
This model is an 8 billion parameter fine-tuned version of the Qwen3-8B architecture, developed by Qwen. It has been specifically fine-tuned on the DCAgent/exp_tas_temp_0.25_traces dataset. The model's training involved a cosine learning rate scheduler with a warmup ratio of 0.005 over 8 epochs, utilizing a distributed multi-GPU setup. Its primary differentiation lies in its specialized fine-tuning for tasks related to the DCAgent/exp_tas_temp_0.25_traces dataset.
Loading preview...
Overview
This model, laion/Qwen3-8B_exp_tas_temp_0.25_traces_save-strategy_steps, is an 8 billion parameter language model based on the Qwen3-8B architecture. It has undergone specific fine-tuning on the DCAgent/exp_tas_temp_0.25_traces dataset.
Training Details
The fine-tuning process involved several key hyperparameters:
- Learning Rate: 0.0001
- Optimizer: ADAMW_TORCH_FUSED with betas=(0.87, 0.99) and epsilon=1e-08
- LR Scheduler: Cosine type with a warmup ratio of 0.005
- Epochs: 8.0
- Batch Size: A total training batch size of 32 was achieved using a distributed multi-GPU setup (32 devices, 1 batch size per device).
Framework Versions
The model was trained using:
- Transformers 4.55.0
- Pytorch 2.7.1+cu128
- Datasets 3.6.0
- Tokenizers 0.21.1
Intended Use
While specific intended uses and limitations require more information, its fine-tuning on the DCAgent/exp_tas_temp_0.25_traces dataset suggests potential application in tasks related to the characteristics or domain of that dataset.