Model Overview
This model, glm46-neulab-synatra-32ep-131k, is an 8 billion parameter language model derived from the Qwen/Qwen3-8B architecture. It has been specifically fine-tuned on the penfever/glm46-neulab-synatra-32ep-131k dataset, indicating a specialization towards the data distribution and tasks represented within that dataset. The model supports a substantial context length of 32768 tokens, allowing for processing of longer inputs and generating more coherent, extended outputs.
Training Details
The fine-tuning process involved a learning rate of 4e-05, a total training batch size of 16 (achieved with a train_batch_size of 1 and gradient_accumulation_steps of 2), and was run for 7 epochs. The optimizer used was ADAMW_TORCH_FUSED with standard beta values and an epsilon of 1e-08. A cosine learning rate scheduler with a 0.1 warmup ratio was employed. The training was distributed across 8 GPUs.
Key Characteristics
- Base Model: Qwen/Qwen3-8B
- Parameter Count: 8 billion
- Context Window: 32768 tokens
- Fine-tuning Dataset: penfever/glm46-neulab-synatra-32ep-131k
Intended Use
Given its fine-tuning on a specific dataset, this model is best suited for applications and tasks that align with the nature and content of the penfever/glm46-neulab-synatra-32ep-131k dataset. Developers should evaluate its performance on their specific use cases, particularly those requiring the processing of long contexts.