DCAgent/g1_gptlong_top8_8b

TEXT GENERATIONConcurrency Cost:1Model Size:8BQuant:FP8Ctx Length:32kPublished:Apr 23, 2026License:otherArchitecture:Transformer Cold

DCAgent/g1_gptlong_top8_8b is an 8 billion parameter language model, fine-tuned from Qwen/Qwen3-8B. This model is specifically adapted for long context understanding and generation, leveraging a specialized dataset for its training. It is designed for applications requiring robust performance over extended conversational or textual inputs.

Loading preview...

Model Overview

DCAgent/g1_gptlong_top8_8b is an 8 billion parameter language model, fine-tuned from the Qwen/Qwen3-8B architecture. This model has been specifically adapted through fine-tuning on the /e/scratch/jureap59/raoof1/sft_data/hf_hub/datasets--DCAgent--g1_min_episodes_e1_gpt_long_top8_glm47_traces dataset, indicating an optimization for tasks involving extended context lengths or complex multi-turn interactions.

Key Training Details

  • Base Model: Qwen/Qwen3-8B
  • Learning Rate: 4e-05
  • Optimizer: AdamW Torch Fused with betas=(0.9, 0.98) and epsilon=1e-08
  • Epochs: 7.0
  • Batch Size: A total training batch size of 96 was achieved using a train_batch_size of 1 and gradient_accumulation_steps of 2 across 48 devices.

Intended Use Cases

While specific intended uses are not detailed in the provided README, the fine-tuning on a "gpt_long_top8" dataset suggests its suitability for applications that benefit from processing and generating content within a 32K context window. This could include:

  • Long-form content generation: Summarizing or creating extensive documents.
  • Complex dialogue systems: Maintaining coherence and context over many turns.
  • Code analysis or generation: Handling larger codebases or detailed specifications.
  • Advanced reasoning tasks: Where understanding intricate relationships across a broad text is crucial.