laion/nemotron-terminal-adapters_math__Qwen3-8B
nemotron-terminal-adapters_math__Qwen3-8B is an 8 billion parameter language model, fine-tuned from Qwen/Qwen3-8B. This model is specifically adapted for mathematical tasks, leveraging a specialized dataset for enhanced performance in numerical reasoning and problem-solving. It is designed for applications requiring precise mathematical computation and logical deduction.
Loading preview...
Model Overview
nemotron-terminal-adapters_math__Qwen3-8B is an 8 billion parameter language model, fine-tuned from the base Qwen/Qwen3-8B architecture. This adaptation focuses on improving the model's capabilities in mathematical reasoning and problem-solving.
Key Training Details
The model was fine-tuned using the /e/data1/datasets/playground/ot-baf/hf_hub/datasets--laion--nemotron-terminal-adapters_math/snapshots/fe7e3230a8a159c8b8293a3bf3df37fa3e26e5c1_thinking_preprocessed dataset. Training involved specific hyperparameters:
- Learning Rate: 4e-05
- Optimizer: ADAMW_TORCH_FUSED with betas=(0.9,0.98) and epsilon=1e-08
- Scheduler: Cosine learning rate scheduler with a 0.1 warmup ratio
- Epochs: 5.0
- Batch Size: A total training batch size of 96 (with gradient accumulation steps of 3 across 32 devices).
Intended Use Cases
Given its fine-tuning on a mathematical dataset, this model is primarily intended for applications that require:
- Solving mathematical problems.
- Numerical reasoning tasks.
- Generating explanations for mathematical concepts.
Further details on specific performance metrics and limitations are not provided in the current model card.