formalmathatepfl/qwen3-8b-sft
formalmathatepfl/qwen3-8b-sft is an 8 billion parameter language model fine-tuned from Qwen/Qwen3-8B, designed for general language tasks. This model has a context length of 32768 tokens and was fine-tuned on a specific dataset to achieve a validation loss of 0.0647. It is intended for applications requiring a robust, instruction-following model within its parameter class.
Loading preview...
Model Overview
formalmathatepfl/qwen3-8b-sft is an 8 billion parameter language model, fine-tuned from the base Qwen/Qwen3-8B architecture. This model was developed through a supervised fine-tuning (SFT) process, aiming to enhance its performance on general language understanding and generation tasks. It supports a substantial context length of 32768 tokens, making it suitable for processing longer inputs and generating comprehensive outputs.
Key Characteristics
- Base Model: Fine-tuned from Qwen/Qwen3-8B.
- Parameter Count: 8 billion parameters.
- Context Length: 32768 tokens.
- Training Objective: Supervised fine-tuning (SFT) on a specific dataset.
- Performance: Achieved a final validation loss of 0.0647 during training.
Training Details
The model was trained with a learning rate of 1e-05, a batch size of 2 per device across 7 GPUs (total batch size 14), and for 1 epoch. The training utilized the AdamW optimizer and a cosine learning rate scheduler with a 0.05 warmup ratio. Mixed-precision training (Native AMP) was employed to optimize efficiency. The training process showed a consistent reduction in validation loss, indicating effective learning from the SFT dataset.
Intended Use Cases
This model is generally suitable for a wide range of natural language processing applications where an 8B parameter model with a large context window is beneficial. Specific use cases would depend on the nature of the SFT dataset, which is not detailed in the provided information.