Model Overview

This model, exp-syh-r2egym-swesmith-mixed_glm_4_7_traces_jupiter_cleaned, is an 8 billion parameter language model derived from the Qwen/Qwen3-8B architecture. It has been fine-tuned using a specific dataset located at /data/cat/ws/befe330h-befe330h-otagent/huggingface/hub/datasets--DCAgent--exp-syh-r2egym-swesmith-mixed_glm_4.7_traces_jupiter_cleaned/snapshots/6bda9bf636a815d9ffd0a001e1a602b93c883472_thinking_preprocessed.

Training Details

The fine-tuning process involved several key hyperparameters:

Learning Rate: 4e-05
Batch Size: 1 (train), 8 (eval)
Gradient Accumulation Steps: 2, leading to a total effective training batch size of 16 across 8 devices.
Optimizer: ADAMW_TORCH_FUSED with betas=(0.9, 0.98) and epsilon=1e-08.
LR Scheduler: Cosine type with a warmup ratio of 0.1.
Epochs: 7.0

The training utilized Transformers 4.57.6, Pytorch 2.9.0+cu128, Datasets 4.4.1, and Tokenizers 0.22.2.

Intended Use Cases

Given its fine-tuning on a specific dataset, this model is likely suitable for tasks related to the domain or style of the training data. Users should evaluate its performance on their specific applications.

Overview

Model Overview

Training Details

Intended Use Cases

Full Model Card (README)