laion/nemotron-terminal-model_training__Qwen3-8B
The laion/nemotron-terminal-model_training__Qwen3-8B is an 8 billion parameter language model, fine-tuned from the Qwen/Qwen3-8B architecture. This model was trained on the /e/data1/datasets/playground/ot/hf_hub/datasets--laion--nemotron-terminal-model_training/snapshots/d6c5e5b4f60135a4401d5beba29dfdfb944fc366_thinking_preprocessed dataset. With a 32768 token context length, it is designed for specific applications related to its fine-tuning data, though further details on its intended uses are not provided.
Loading preview...
Model Overview
This model, nemotron-terminal-model_training__Qwen3-8B, is an 8 billion parameter language model. It is a fine-tuned variant of the Qwen/Qwen3-8B base model, indicating its foundation in the Qwen3 architecture. The fine-tuning process utilized the /e/data1/datasets/playground/ot/hf_hub/datasets--laion--nemotron-terminal-model_training/snapshots/d6c5e5b4f60135a4401d5beba29dfdfb944fc366_thinking_preprocessed dataset.
Training Details
The model was trained using the following key hyperparameters:
- Learning Rate: 4e-05
- Batch Size: A total training batch size of 96 (1 per device across 32 GPUs with 3 gradient accumulation steps).
- Optimizer: ADAMW_TORCH_FUSED with specific beta values and epsilon.
- LR Scheduler: Cosine type with a 0.1 warmup ratio.
- Epochs: Trained for 7.0 epochs.
Key Characteristics
- Base Model: Qwen3-8B
- Parameter Count: 8 billion
- Context Length: 32768 tokens
Intended Use Cases
Specific intended uses and limitations are not detailed in the provided information, suggesting further exploration of the fine-tuning dataset's nature would be necessary to determine optimal applications.