Overview
This model, formalmathatepfl/Qwen3-8B-finetuned, is an 8 billion parameter language model derived from the Qwen/Qwen3-8B base architecture. It has been fine-tuned to enhance its performance in specific domains, particularly those requiring precise reasoning and mathematical understanding. The model supports a substantial context length of 32768 tokens, allowing it to process and generate longer, more complex sequences of text.
Key Capabilities
- Enhanced Mathematical Reasoning: Fine-tuned to improve accuracy and understanding in mathematical and formal logic tasks.
- Large Context Window: Utilizes a 32768 token context length, beneficial for handling extensive problem descriptions or detailed logical arguments.
- Qwen3-8B Foundation: Builds upon the robust capabilities of the Qwen3-8B model, inheriting its general language understanding and generation strengths.
Training Details
The model underwent a fine-tuning process with specific hyperparameters:
- Learning Rate: 0.0001
- Batch Size: A total training batch size of 16 (2 per device across 4 GPUs with 2 gradient accumulation steps).
- Optimizer: ADAMW_TORCH with default betas and epsilon.
- Scheduler: Cosine learning rate scheduler with 0.03 warmup steps over 1 epoch.
Good for
- Applications requiring strong mathematical problem-solving.
- Tasks involving formal logic and precise reasoning.
- Scenarios where a large context window is crucial for understanding complex inputs.