Overview
DCAgent/a1-nemotron_rust is a specialized language model derived from the Qwen3-8B architecture, developed by DCAgent. It has undergone fine-tuning on a unique dataset, specifically /e/scratch/jureap59/raoof1/sft_data/hf_hub/datasets--DCAgent--exp_rpt_nemotron-rust_10k_glm_4.7_traces_jupiter/snapshots/3132525161b49015e6ef5c5bf75c3a14ca21c34b_thinking_preprocessed.
Training Details
The model was trained using the following key hyperparameters:
- Learning Rate: 4e-05
- Batch Size: 1 (train), 8 (eval)
- Distributed Training: Multi-GPU setup with 16 devices, resulting in a total train batch size of 16 and eval batch size of 128.
- Optimizer: ADAMW_TORCH_FUSED with betas=(0.9, 0.98) and epsilon=1e-08.
- Scheduler: Cosine learning rate scheduler with a warmup ratio of 0.1.
- Epochs: 7.0
Intended Uses & Limitations
Specific intended uses and limitations are not detailed in the provided information, suggesting that further exploration or documentation is needed to fully understand its optimal applications and potential constraints. Developers should consult additional resources for comprehensive guidance on its deployment and capabilities.