laion/nemotron-terminal-data_science__Qwen3-8B
The laion/nemotron-terminal-data_science__Qwen3-8B model is an 8 billion parameter language model, fine-tuned from Qwen/Qwen3-8B. It is specifically adapted for data science applications, having been trained on the laion/nemotron-terminal-data_science dataset. This model is designed to enhance performance in data science-related tasks and contexts.
Loading preview...
Overview
This model, laion/nemotron-terminal-data_science__Qwen3-8B, is an 8 billion parameter language model derived from the Qwen/Qwen3-8B architecture. It has undergone specific fine-tuning on the /e/data1/datasets/playground/ot/hf_hub/datasets--laion--nemotron-terminal-data_science/snapshots/a640630fa901059c1da260b099511ab4a6a4c85c_thinking_preprocessed dataset.
Training Details
The model was trained with a learning rate of 4e-05 over 7.0 epochs, utilizing a cosine learning rate scheduler with a 0.1 warmup ratio. The training involved a total batch size of 96 across 32 devices, employing AdamW_Torch_Fused as the optimizer. The development environment included Transformers 4.57.6, Pytorch 2.9.1+cu130, Datasets 4.7.0, and Tokenizers 0.22.2.
Intended Use
While specific intended uses and limitations are not detailed in the provided information, the fine-tuning on a data science-oriented dataset suggests its primary application would be within data science workflows and tasks. Users should consider its specialized training for contexts requiring understanding or generation related to data science concepts and operations.