DCAgent/a1-r2egym
DCAgent/a1-r2egym is a fine-tuned version of the Qwen3-8B causal language model. This model was trained on the r2egym_sandboxes_10k_glm_4.7_traces_jupiter dataset, indicating a specialization in environments related to reinforcement learning or agent-based tasks. It leverages a multi-GPU setup with 16 devices and a cosine learning rate scheduler over 7 epochs. The fine-tuning process suggests an optimization for specific interactive or decision-making scenarios.
Loading preview...
Model Overview
DCAgent/a1-r2egym is a specialized language model derived from the Qwen3-8B architecture. It has undergone fine-tuning on a unique dataset, /e/scratch/jureap59/raoof1/sft_data/hf_hub/datasets--DCAgent--r2egym_sandboxes_10k_glm_4.7_traces_jupiter/snapshots/bf10c6912b106ea55b7b06e79c99fc4d038a8437_thinking_preprocessed, suggesting a focus on tasks related to agent environments or reinforcement learning.
Training Details
The model was trained using the following key hyperparameters:
- Learning Rate: 4e-05
- Batch Size: 1 (train), 8 (eval)
- Distributed Training: Multi-GPU setup with 16 devices, resulting in a total effective batch size of 16 for training and 128 for evaluation.
- Optimizer: ADAMW_TORCH_FUSED with betas=(0.9, 0.98) and epsilon=1e-08.
- Scheduler: Cosine learning rate scheduler with a warmup ratio of 0.1.
- Epochs: 7.0
This fine-tuning process, utilizing specific training data and parameters, indicates an intent to adapt the base Qwen3-8B model for particular interactive or decision-making applications, likely within simulated or sandbox environments as suggested by the dataset name.