formalmathatepfl/Qwen3-8B-finetuned
formalmathatepfl/Qwen3-8B-finetuned is an 8 billion parameter causal language model, fine-tuned from the Qwen/Qwen3-8B architecture. This model is specifically optimized for tasks requiring mathematical reasoning and formal logic, building upon its base model's capabilities. It is designed for applications that benefit from enhanced precision in mathematical contexts, leveraging its 32768 token context length for complex problem-solving.
Loading preview...
Overview
This model, formalmathatepfl/Qwen3-8B-finetuned, is an 8 billion parameter language model derived from the Qwen/Qwen3-8B base architecture. It has been fine-tuned to enhance its performance in specific domains, particularly those requiring precise reasoning and mathematical understanding. The model supports a substantial context length of 32768 tokens, allowing it to process and generate longer, more complex sequences of text.
Key Capabilities
- Enhanced Mathematical Reasoning: Fine-tuned to improve accuracy and understanding in mathematical and formal logic tasks.
- Large Context Window: Utilizes a 32768 token context length, beneficial for handling extensive problem descriptions or detailed logical arguments.
- Qwen3-8B Foundation: Builds upon the robust capabilities of the Qwen3-8B model, inheriting its general language understanding and generation strengths.
Training Details
The model underwent a fine-tuning process with specific hyperparameters:
- Learning Rate: 0.0001
- Batch Size: A total training batch size of 16 (2 per device across 4 GPUs with 2 gradient accumulation steps).
- Optimizer: ADAMW_TORCH with default betas and epsilon.
- Scheduler: Cosine learning rate scheduler with 0.03 warmup steps over 1 epoch.
Good for
- Applications requiring strong mathematical problem-solving.
- Tasks involving formal logic and precise reasoning.
- Scenarios where a large context window is crucial for understanding complex inputs.