laion/r2egym-unified-1000__Qwen3-8B
The laion/r2egym-unified-1000__Qwen3-8B is an 8 billion parameter language model, fine-tuned from Qwen/Qwen3-8B. This model was specifically trained on the r2egym-unified-1000 dataset, suggesting an optimization for tasks related to the content of this dataset. With a context length of 32768 tokens, it is designed for applications requiring processing of extensive input sequences.
Loading preview...
Model Overview
This model, laion/r2egym-unified-1000__Qwen3-8B, is an 8 billion parameter language model built upon the robust Qwen/Qwen3-8B architecture. It has been fine-tuned using the /e/data1/datasets/playground/ot/hf_hub/datasets--laion--r2egym-unified-1000/snapshots/5f3bc7d941f44406d18e2d31cdb42df47890e5f5_thinking_preprocessed dataset.
Training Details
The fine-tuning process involved specific hyperparameters aimed at optimizing performance:
- Learning Rate: 4e-05
- Batch Sizes: A
train_batch_sizeof 1 andeval_batch_sizeof 8, with atotal_train_batch_sizeof 96 andtotal_eval_batch_sizeof 256, achieved through agradient_accumulation_stepsof 3 across 32 GPUs. - Optimizer: Utilized
ADAMW_TORCH_FUSEDwith betas=(0.9, 0.98) and epsilon=1e-08. - Scheduler: A cosine learning rate scheduler with a warmup ratio of 0.1 was employed over 7 epochs.
Framework Versions
The model was trained using:
- Transformers 4.57.6
- Pytorch 2.9.1+cu130
- Datasets 4.7.0
- Tokenizers 0.22.2
Further information regarding the model's specific description, intended uses, limitations, and detailed training/evaluation data is currently pending.