xiaolesu/OsmosisProofling-v3-SFT
The xiaolesu/OsmosisProofling-v3-SFT is an 8 billion parameter instruction-tuned causal language model, fine-tuned from Qwen/Qwen3-8B. This model leverages the Qwen3 architecture and was trained using the xiaolesu/OsmosisProofling-v3-SFT dataset. It demonstrates a validation loss of 0.3543 and a perplexity of 1.4252, indicating strong performance on its specific fine-tuning task. The model is optimized for tasks aligned with its training data, offering a specialized solution based on the Qwen3 foundation.
Loading preview...
Model Overview
The xiaolesu/OsmosisProofling-v3-SFT is an 8 billion parameter instruction-tuned model, built upon the robust Qwen/Qwen3-8B architecture. It has been specifically fine-tuned using the xiaolesu/OsmosisProofling-v3-SFT dataset, indicating a specialization towards the characteristics and tasks present in this dataset.
Key Training Details
- Base Model: Qwen/Qwen3-8B
- Fine-tuning Dataset:
xiaolesu/OsmosisProofling-v3-SFT - Training Framework: Axolotl (version 0.16.0.dev0)
- Context Length: Configured for a sequence length of 4096 tokens during training.
- Optimization: Utilizes
adamw_torch_fusedoptimizer with a learning rate of 1e-05 and a cosine learning rate scheduler. - Memory Footprint: Achieved a maximum active memory of 20.98 GiB during evaluation.
Performance Metrics
During evaluation, the model achieved notable results:
- Validation Loss: 0.3543
- Perplexity (PPL): 1.4252
Intended Uses & Limitations
While specific intended uses and limitations are not detailed in the provided information, the model's fine-tuning on a particular dataset suggests its strengths lie in tasks similar to those within the xiaolesu/OsmosisProofling-v3-SFT dataset. Users should evaluate its performance on their specific use cases, especially considering the specialized nature of its training.