Model Overview
DCAgent/a1-curriculum_easy is an 8 billion parameter language model, fine-tuned from the base Qwen/Qwen3-8B architecture. This model was specifically adapted using a dataset derived from /e/scratch/jureap59/raoof1/sft_data/hf_hub/datasets--DCAgent--exp_rpt_curriculum-easy_10k_glm_4.7_traces_jupiter. It maintains a substantial context length of 32768 tokens, allowing for processing of extensive inputs.
Training Details
The fine-tuning process involved several key hyperparameters:
- Learning Rate: 4e-05
- Batch Size: 1 (train), 8 (eval)
- Optimizer: ADAMW_TORCH_FUSED with betas=(0.9, 0.98) and epsilon=1e-08
- LR Scheduler: Cosine type with a warmup ratio of 0.1
- Epochs: 7.0
- Devices: Trained across 16 multi-GPU devices, resulting in a total train batch size of 16 and eval batch size of 128.
Intended Use
While specific intended uses and limitations require further documentation, the model's fine-tuning on a "curriculum-easy" dataset suggests an optimization for tasks related to structured learning, progressive data processing, or scenarios where a model needs to follow a defined curriculum or sequence of operations. Developers should consider its specialized training for applications requiring nuanced understanding of sequential or curriculum-based data.