DCAgent/a1-stack_jest

TEXT GENERATIONConcurrency Cost:1Model Size:8BQuant:FP8Ctx Length:32kPublished:Mar 23, 2026License:otherArchitecture:Transformer Cold

DCAgent/a1-stack_jest is an 8 billion parameter causal language model fine-tuned from Qwen/Qwen3-8B. This model is specifically trained on the 'exp_rpt_stack-jest-large_10k_glm_4.7_traces_jupiter' dataset, indicating an optimization for tasks related to report generation or trace analysis within a specific domain. It leverages a 32768 token context length, making it suitable for processing longer inputs relevant to its specialized training data.

Loading preview...

Overview

DCAgent/a1-stack_jest is an 8 billion parameter language model, fine-tuned from the base Qwen3-8B architecture. This model has undergone specialized training on the exp_rpt_stack-jest-large_10k_glm_4.7_traces_jupiter dataset, suggesting its development for tasks involving report processing, trace analysis, or similar domain-specific applications. The fine-tuning process utilized a learning rate of 4e-05, a cosine learning rate scheduler with a 0.1 warmup ratio, and was trained for 7 epochs across 16 GPUs.

Key Training Details

  • Base Model: Qwen/Qwen3-8B
  • Fine-tuning Dataset: exp_rpt_stack-jest-large_10k_glm_4.7_traces_jupiter
  • Learning Rate: 4e-05
  • Optimizer: AdamW_Torch_Fused with betas=(0.9, 0.98) and epsilon=1e-08
  • Epochs: 7.0
  • Distributed Training: Multi-GPU (16 devices)

Intended Use Cases

Given its specialized training data, DCAgent/a1-stack_jest is likely optimized for:

  • Generating or analyzing reports based on structured traces.
  • Processing and understanding domain-specific logs or diagnostic outputs.
  • Tasks requiring contextual understanding from large trace datasets.