DCAgent/c1_gpt53_codex_fixed

TEXT GENERATIONConcurrency Cost:1Model Size:8BQuant:FP8Ctx Length:32kPublished:Apr 10, 2026License:otherArchitecture:Transformer Cold

DCAgent/c1_gpt53_codex_fixed is an 8 billion parameter causal language model, fine-tuned from Qwen/Qwen3-8B. This model is specifically adapted for tasks related to the /e/scratch/jureap59/raoof1/sft_data/hf_hub/datasets--DCAgent--c1_gpt53_codex_fixed/snapshots/38a6f93a475416e79a04e373ed2b1ff2d1d7c45a_thinking_preprocessed dataset, suggesting a specialization in areas covered by its training data. With a context length of 32768 tokens, it is suitable for processing extensive inputs relevant to its fine-tuning domain.

Loading preview...

Overview

DCAgent/c1_gpt53_codex_fixed is an 8 billion parameter language model, fine-tuned from the Qwen/Qwen3-8B architecture. This model has been specifically adapted through supervised fine-tuning (SFT) on a unique dataset located at /e/scratch/jureap59/raoof1/sft_data/hf_hub/datasets--DCAgent--c1_gpt53_codex_fixed/snapshots/38a6f93a475416e79a04e373ed2b1ff2d1d7c45a_thinking_preprocessed.

Training Details

The model underwent 7 epochs of training with a learning rate of 4e-05, utilizing a cosine learning rate scheduler with a 0.1 warmup ratio. Training was conducted on 16 devices with a total batch size of 16, employing the ADAMW_TORCH_FUSED optimizer. The development environment included Transformers 4.57.6, Pytorch 2.9.1+cu130, Datasets 4.7.0, and Tokenizers 0.22.2.

Key Characteristics

  • Base Model: Qwen/Qwen3-8B
  • Parameter Count: 8 billion
  • Context Length: 32768 tokens
  • Fine-tuning: Specialized SFT on a custom dataset, indicating potential domain-specific capabilities.

Intended Use

While specific intended uses and limitations require further information, the fine-tuning on a particular dataset suggests its utility in applications aligned with the nature of that data. Developers should consider its specialized training for tasks where the custom dataset's characteristics are beneficial.