laion/Qwen3-8B_exp_tas_temp_0.25_traces_save-strategy_steps

TEXT GENERATIONConcurrency Cost:1Model Size:8BQuant:FP8Ctx Length:32kPublished:Jan 9, 2026License:apache-2.0Architecture:Transformer Open Weights Cold

This model is an 8 billion parameter fine-tuned version of the Qwen3-8B architecture, developed by Qwen. It has been specifically fine-tuned on the DCAgent/exp_tas_temp_0.25_traces dataset. The model's training involved a cosine learning rate scheduler with a warmup ratio of 0.005 over 8 epochs, utilizing a distributed multi-GPU setup. Its primary differentiation lies in its specialized fine-tuning for tasks related to the DCAgent/exp_tas_temp_0.25_traces dataset.

Loading preview...

Overview

This model, laion/Qwen3-8B_exp_tas_temp_0.25_traces_save-strategy_steps, is an 8 billion parameter language model based on the Qwen3-8B architecture. It has undergone specific fine-tuning on the DCAgent/exp_tas_temp_0.25_traces dataset.

Training Details

The fine-tuning process involved several key hyperparameters:

  • Learning Rate: 0.0001
  • Optimizer: ADAMW_TORCH_FUSED with betas=(0.87, 0.99) and epsilon=1e-08
  • LR Scheduler: Cosine type with a warmup ratio of 0.005
  • Epochs: 8.0
  • Batch Size: A total training batch size of 32 was achieved using a distributed multi-GPU setup (32 devices, 1 batch size per device).

Framework Versions

The model was trained using:

  • Transformers 4.55.0
  • Pytorch 2.7.1+cu128
  • Datasets 3.6.0
  • Tokenizers 0.21.1

Intended Use

While specific intended uses and limitations require more information, its fine-tuning on the DCAgent/exp_tas_temp_0.25_traces dataset suggests potential application in tasks related to the characteristics or domain of that dataset.