Model Overview
The xiaolesu/OsmosisProofling-SFT-NT-GRPO-NT is an 8 billion parameter language model, fine-tuned from the Qwen/Qwen3-8B base model. It was developed using the Axolotl framework (version 0.16.0.dev0) and incorporates Liger plugin features such as liger_rope, liger_rms_norm, liger_glu_activation, liger_layer_norm, and liger_fused_linear_cross_entropy.
Training Details
The model was fine-tuned on the xiaolesu/OsmosisProofling-v3-SFT dataset for 2 epochs, with a learning rate of 1e-05 and a sequence length of 4096. Training utilized a total batch size of 14 across 7 GPUs. Key training results include a final validation loss of 0.3543 and a perplexity (Ppl) of 1.4252. Memory usage during evaluation peaked at 20.98 GiB active and allocated memory.
Key Features
- Base Model: Qwen3-8B architecture.
- Fine-tuning Dataset:
xiaolesu/OsmosisProofling-v3-SFT. - Context Length: Supports a sequence length of 4096 tokens during training.
- Performance: Achieved a perplexity of 1.4252 on the validation set.
Intended Use Cases
This model is specifically fine-tuned on the xiaolesu/OsmosisProofling-v3-SFT dataset, suggesting its primary utility lies in tasks aligned with the characteristics and content of that dataset. Developers should evaluate its performance for specific applications that benefit from the model's fine-tuning domain.