laion/nemotron-terminal-adapters_math__Qwen3-8B

TEXT GENERATIONConcurrency Cost:1Model Size:8BQuant:FP8Ctx Length:32kPublished:Apr 18, 2026License:otherArchitecture:Transformer Cold

nemotron-terminal-adapters_math__Qwen3-8B is an 8 billion parameter language model, fine-tuned from Qwen/Qwen3-8B. This model is specifically adapted for mathematical tasks, leveraging a specialized dataset for enhanced performance in numerical reasoning and problem-solving. It is designed for applications requiring precise mathematical computation and logical deduction.

Loading preview...

Model Overview

nemotron-terminal-adapters_math__Qwen3-8B is an 8 billion parameter language model, fine-tuned from the base Qwen/Qwen3-8B architecture. This adaptation focuses on improving the model's capabilities in mathematical reasoning and problem-solving.

Key Training Details

The model was fine-tuned using the /e/data1/datasets/playground/ot-baf/hf_hub/datasets--laion--nemotron-terminal-adapters_math/snapshots/fe7e3230a8a159c8b8293a3bf3df37fa3e26e5c1_thinking_preprocessed dataset. Training involved specific hyperparameters:

  • Learning Rate: 4e-05
  • Optimizer: ADAMW_TORCH_FUSED with betas=(0.9,0.98) and epsilon=1e-08
  • Scheduler: Cosine learning rate scheduler with a 0.1 warmup ratio
  • Epochs: 5.0
  • Batch Size: A total training batch size of 96 (with gradient accumulation steps of 3 across 32 devices).

Intended Use Cases

Given its fine-tuning on a mathematical dataset, this model is primarily intended for applications that require:

  • Solving mathematical problems.
  • Numerical reasoning tasks.
  • Generating explanations for mathematical concepts.

Further details on specific performance metrics and limitations are not provided in the current model card.