formalmathatepfl/qwen3-8b-sft

Hugging Face
TEXT GENERATIONConcurrency Cost:1Model Size:8BQuant:FP8Ctx Length:32kPublished:May 19, 2026License:otherArchitecture:Transformer Warm

formalmathatepfl/qwen3-8b-sft is an 8 billion parameter language model fine-tuned from Qwen/Qwen3-8B, designed for general language tasks. This model has a context length of 32768 tokens and was fine-tuned on a specific dataset to achieve a validation loss of 0.0647. It is intended for applications requiring a robust, instruction-following model within its parameter class.

Loading preview...

Model Overview

formalmathatepfl/qwen3-8b-sft is an 8 billion parameter language model, fine-tuned from the base Qwen/Qwen3-8B architecture. This model was developed through a supervised fine-tuning (SFT) process, aiming to enhance its performance on general language understanding and generation tasks. It supports a substantial context length of 32768 tokens, making it suitable for processing longer inputs and generating comprehensive outputs.

Key Characteristics

  • Base Model: Fine-tuned from Qwen/Qwen3-8B.
  • Parameter Count: 8 billion parameters.
  • Context Length: 32768 tokens.
  • Training Objective: Supervised fine-tuning (SFT) on a specific dataset.
  • Performance: Achieved a final validation loss of 0.0647 during training.

Training Details

The model was trained with a learning rate of 1e-05, a batch size of 2 per device across 7 GPUs (total batch size 14), and for 1 epoch. The training utilized the AdamW optimizer and a cosine learning rate scheduler with a 0.05 warmup ratio. Mixed-precision training (Native AMP) was employed to optimize efficiency. The training process showed a consistent reduction in validation loss, indicating effective learning from the SFT dataset.

Intended Use Cases

This model is generally suitable for a wide range of natural language processing applications where an 8B parameter model with a large context window is beneficial. Specific use cases would depend on the nature of the SFT dataset, which is not detailed in the provided information.