Name: formalmathatepfl/Qwen3-8B-finetuned API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: formalmathatepfl

Overview

This model, formalmathatepfl/Qwen3-8B-finetuned, is an 8 billion parameter language model derived from the Qwen/Qwen3-8B base architecture. It has been fine-tuned to enhance its performance in specific domains, particularly those requiring precise reasoning and mathematical understanding. The model supports a substantial context length of 32768 tokens, allowing it to process and generate longer, more complex sequences of text.

Key Capabilities

Enhanced Mathematical Reasoning: Fine-tuned to improve accuracy and understanding in mathematical and formal logic tasks.
Large Context Window: Utilizes a 32768 token context length, beneficial for handling extensive problem descriptions or detailed logical arguments.
Qwen3-8B Foundation: Builds upon the robust capabilities of the Qwen3-8B model, inheriting its general language understanding and generation strengths.

Training Details

The model underwent a fine-tuning process with specific hyperparameters:

Learning Rate: 0.0001
Batch Size: A total training batch size of 16 (2 per device across 4 GPUs with 2 gradient accumulation steps).
Optimizer: ADAMW_TORCH with default betas and epsilon.
Scheduler: Cosine learning rate scheduler with 0.03 warmup steps over 1 epoch.

Good for

Applications requiring strong mathematical problem-solving.
Tasks involving formal logic and precise reasoning.
Scenarios where a large context window is crucial for understanding complex inputs.

Overview

Overview

Key Capabilities

Training Details

Good for

Full Model Card (README)