DCAgent/a1-r2egym

Hugging Face
TEXT GENERATIONConcurrency Cost:1Model Size:8BQuant:FP8Ctx Length:32kPublished:Mar 25, 2026License:otherArchitecture:Transformer Warm

DCAgent/a1-r2egym is a fine-tuned version of the Qwen3-8B causal language model. This model was trained on the r2egym_sandboxes_10k_glm_4.7_traces_jupiter dataset, indicating a specialization in environments related to reinforcement learning or agent-based tasks. It leverages a multi-GPU setup with 16 devices and a cosine learning rate scheduler over 7 epochs. The fine-tuning process suggests an optimization for specific interactive or decision-making scenarios.

Loading preview...

Model Overview

DCAgent/a1-r2egym is a specialized language model derived from the Qwen3-8B architecture. It has undergone fine-tuning on a unique dataset, /e/scratch/jureap59/raoof1/sft_data/hf_hub/datasets--DCAgent--r2egym_sandboxes_10k_glm_4.7_traces_jupiter/snapshots/bf10c6912b106ea55b7b06e79c99fc4d038a8437_thinking_preprocessed, suggesting a focus on tasks related to agent environments or reinforcement learning.

Training Details

The model was trained using the following key hyperparameters:

  • Learning Rate: 4e-05
  • Batch Size: 1 (train), 8 (eval)
  • Distributed Training: Multi-GPU setup with 16 devices, resulting in a total effective batch size of 16 for training and 128 for evaluation.
  • Optimizer: ADAMW_TORCH_FUSED with betas=(0.9, 0.98) and epsilon=1e-08.
  • Scheduler: Cosine learning rate scheduler with a warmup ratio of 0.1.
  • Epochs: 7.0

This fine-tuning process, utilizing specific training data and parameters, indicates an intent to adapt the base Qwen3-8B model for particular interactive or decision-making applications, likely within simulated or sandbox environments as suggested by the dataset name.