Model Overview

This model, glm46-neulab-synatra-32ep-131k, is an 8 billion parameter language model derived from the Qwen/Qwen3-8B architecture. It has been specifically fine-tuned on the penfever/glm46-neulab-synatra-32ep-131k dataset, indicating a specialization towards the data distribution and tasks represented within that dataset. The model supports a substantial context length of 32768 tokens, allowing for processing of longer inputs and generating more coherent, extended outputs.

Training Details

The fine-tuning process involved a learning rate of 4e-05, a total training batch size of 16 (achieved with a train_batch_size of 1 and gradient_accumulation_steps of 2), and was run for 7 epochs. The optimizer used was ADAMW_TORCH_FUSED with standard beta values and an epsilon of 1e-08. A cosine learning rate scheduler with a 0.1 warmup ratio was employed. The training was distributed across 8 GPUs.

Key Characteristics

Base Model: Qwen/Qwen3-8B
Parameter Count: 8 billion
Context Window: 32768 tokens
Fine-tuning Dataset: penfever/glm46-neulab-synatra-32ep-131k

Intended Use

Given its fine-tuning on a specific dataset, this model is best suited for applications and tasks that align with the nature and content of the penfever/glm46-neulab-synatra-32ep-131k dataset. Developers should evaluate its performance on their specific use cases, particularly those requiring the processing of long contexts.

Overview

Model Overview

Training Details

Key Characteristics

Intended Use

Full Model Card (README)