Name: formalmathatepfl/qwen3-8b-sft API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: formalmathatepfl

Model Overview

formalmathatepfl/qwen3-8b-sft is an 8 billion parameter language model, fine-tuned from the base Qwen/Qwen3-8B architecture. This model was developed through a supervised fine-tuning (SFT) process, aiming to enhance its performance on general language understanding and generation tasks. It supports a substantial context length of 32768 tokens, making it suitable for processing longer inputs and generating comprehensive outputs.

Key Characteristics

Base Model: Fine-tuned from Qwen/Qwen3-8B.
Parameter Count: 8 billion parameters.
Context Length: 32768 tokens.
Training Objective: Supervised fine-tuning (SFT) on a specific dataset.
Performance: Achieved a final validation loss of 0.0647 during training.

Training Details

The model was trained with a learning rate of 1e-05, a batch size of 2 per device across 7 GPUs (total batch size 14), and for 1 epoch. The training utilized the AdamW optimizer and a cosine learning rate scheduler with a 0.05 warmup ratio. Mixed-precision training (Native AMP) was employed to optimize efficiency. The training process showed a consistent reduction in validation loss, indicating effective learning from the SFT dataset.

Intended Use Cases

This model is generally suitable for a wide range of natural language processing applications where an 8B parameter model with a large context window is beneficial. Specific use cases would depend on the nature of the SFT dataset, which is not detailed in the provided information.

Overview

Model Overview

Key Characteristics

Training Details

Intended Use Cases

Full Model Card (README)