DCAgent/e1_askllm_d1_original_glm47

TEXT GENERATIONConcurrency Cost:1Model Size:8BQuant:FP8Ctx Length:32kPublished:Apr 14, 2026License:otherArchitecture:Transformer Cold

DCAgent/e1_askllm_d1_original_glm47 is an 8 billion parameter language model fine-tuned from Qwen/Qwen3-8B. It features a 32,768 token context length and was trained on the /e/scratch/jureap59/raoof1/sft_data/hf_hub/datasets--DCAgent--e1_askllm_d1_original_glm47_traces dataset. This model is specifically adapted for tasks related to its fine-tuning data, making it suitable for applications requiring specialized knowledge from that domain.

Loading preview...

Overview

DCAgent/e1_askllm_d1_original_glm47 is an 8 billion parameter language model, fine-tuned from the robust Qwen/Qwen3-8B architecture. It boasts a substantial context length of 32,768 tokens, making it capable of processing extensive inputs and maintaining coherence over long interactions. The model was specifically trained on the /e/scratch/jureap59/raoof1/sft_data/hf_hub/datasets--DCAgent--e1_askllm_d1_original_glm47_traces dataset.

Key Capabilities

  • Extended Context Handling: Leverages a 32,768 token context window for complex, multi-turn conversations or detailed document analysis.
  • Specialized Fine-tuning: Benefits from fine-tuning on a specific dataset, suggesting enhanced performance for tasks aligned with that data's domain.

Good For

  • Applications requiring deep understanding and generation within the domain of the fine-tuning dataset.
  • Tasks that benefit from processing long documents or maintaining context over extended interactions.

Training Details

The model was trained with a learning rate of 4e-05, using an AdamW_Torch_Fused optimizer and a cosine learning rate scheduler with a 0.1 warmup ratio over 7 epochs. The training utilized a multi-GPU setup with 16 devices, resulting in a total train batch size of 16.