laion/nemotron-terminal-data_querying__Qwen3-8B
The laion/nemotron-terminal-data_querying__Qwen3-8B is an 8 billion parameter language model, fine-tuned from Qwen/Qwen3-8B. This model is specifically optimized for data querying tasks within a terminal environment. It leverages a 32768 token context length, making it suitable for processing extensive data inputs and complex query structures. The fine-tuning process focused on enhancing its ability to understand and generate responses relevant to data retrieval and manipulation.
Loading preview...
Overview
This model, nemotron-terminal-data_querying__Qwen3-8B, is an 8 billion parameter language model derived from the Qwen/Qwen3-8B architecture. It has been fine-tuned on a specialized dataset (/e/data1/datasets/playground/ot/hf_hub/datasets--laion--nemotron-terminal-data_querying) to excel in data querying scenarios, particularly within a terminal context. The model benefits from a substantial 32768 token context window, allowing it to handle complex and lengthy data-related prompts.
Key Capabilities
- Specialized Data Querying: Fine-tuned for understanding and responding to data-related queries.
- Large Context Window: Supports a 32768 token context length, beneficial for detailed data analysis and complex instructions.
- Qwen3-8B Foundation: Built upon the robust Qwen3-8B base model, inheriting its general language understanding capabilities.
Training Details
The model was trained with a learning rate of 4e-05, a total batch size of 96 (achieved with train_batch_size: 1 and gradient_accumulation_steps: 3 across 32 GPUs), and utilized the AdamW optimizer. A cosine learning rate scheduler with a 0.1 warmup ratio was employed over 7 epochs. The training leveraged Transformers 4.57.6 and PyTorch 2.9.1+cu130.