DCAgent/g1_timeout_e1_gpt_long_tacc

TEXT GENERATIONConcurrency Cost:1Model Size:8BQuant:FP8Ctx Length:32kPublished:Apr 15, 2026License:otherArchitecture:Transformer Cold

DCAgent/g1_timeout_e1_gpt_long_tacc is an 8 billion parameter language model, fine-tuned from Qwen/Qwen3-8B. It was trained on the DCAgent/g1_timeout_e1_gpt_long_d1_original_40k_glm47_traces dataset, suggesting specialization in specific agentic or long-context tasks. With a context length of 32768 tokens, this model is likely optimized for processing and generating extended sequences of text relevant to its training data.

Loading preview...

Model Overview

This model, DCAgent/g1_timeout_e1_gpt_long_tacc, is an 8 billion parameter language model derived from the Qwen/Qwen3-8B architecture. It has been specifically fine-tuned on the DCAgent/g1_timeout_e1_gpt_long_d1_original_40k_glm47_traces dataset, indicating a potential specialization in tasks related to agentic behavior, long-context understanding, or specific trace analysis.

Key Training Details

The fine-tuning process utilized a learning rate of 4e-05 and was conducted over 7.0 epochs. Training involved 16 distributed devices, resulting in a total batch size of 16. The optimizer used was ADAMW_TORCH_FUSED with standard beta values and an epsilon of 1e-08, employing a cosine learning rate scheduler with a 0.1 warmup ratio.

Potential Use Cases

Given its fine-tuning dataset, this model is likely best suited for applications requiring:

  • Processing and generating long sequences of text.
  • Tasks related to agentic systems or trace analysis.
  • Scenarios where understanding extended context is crucial.