DCAgent/a1-curriculum_easy

TEXT GENERATIONConcurrency Cost:1Model Size:8BQuant:FP8Ctx Length:32kPublished:Mar 25, 2026License:otherArchitecture:Transformer Cold

DCAgent/a1-curriculum_easy is an 8 billion parameter language model, fine-tuned from Qwen/Qwen3-8B, designed for specific tasks related to curriculum learning. This model leverages a 32768 token context length and was trained on a specialized dataset for improved performance in its intended domain. Its fine-tuning process focuses on enhancing capabilities for particular learning trajectories and data structures.

Loading preview...

Model Overview

DCAgent/a1-curriculum_easy is an 8 billion parameter language model, fine-tuned from the base Qwen/Qwen3-8B architecture. This model was specifically adapted using a dataset derived from /e/scratch/jureap59/raoof1/sft_data/hf_hub/datasets--DCAgent--exp_rpt_curriculum-easy_10k_glm_4.7_traces_jupiter. It maintains a substantial context length of 32768 tokens, allowing for processing of extensive inputs.

Training Details

The fine-tuning process involved several key hyperparameters:

  • Learning Rate: 4e-05
  • Batch Size: 1 (train), 8 (eval)
  • Optimizer: ADAMW_TORCH_FUSED with betas=(0.9, 0.98) and epsilon=1e-08
  • LR Scheduler: Cosine type with a warmup ratio of 0.1
  • Epochs: 7.0
  • Devices: Trained across 16 multi-GPU devices, resulting in a total train batch size of 16 and eval batch size of 128.

Intended Use

While specific intended uses and limitations require further documentation, the model's fine-tuning on a "curriculum-easy" dataset suggests an optimization for tasks related to structured learning, progressive data processing, or scenarios where a model needs to follow a defined curriculum or sequence of operations. Developers should consider its specialized training for applications requiring nuanced understanding of sequential or curriculum-based data.