laion/nemotron-terminal-data_querying__Qwen3-8B

TEXT GENERATIONConcurrency Cost:1Model Size:8BQuant:FP8Ctx Length:32kPublished:Apr 13, 2026License:otherArchitecture:Transformer Cold

The laion/nemotron-terminal-data_querying__Qwen3-8B is an 8 billion parameter language model, fine-tuned from Qwen/Qwen3-8B. This model is specifically optimized for data querying tasks within a terminal environment. It leverages a 32768 token context length, making it suitable for processing extensive data inputs and complex query structures. The fine-tuning process focused on enhancing its ability to understand and generate responses relevant to data retrieval and manipulation.

Loading preview...

Overview

This model, nemotron-terminal-data_querying__Qwen3-8B, is an 8 billion parameter language model derived from the Qwen/Qwen3-8B architecture. It has been fine-tuned on a specialized dataset (/e/data1/datasets/playground/ot/hf_hub/datasets--laion--nemotron-terminal-data_querying) to excel in data querying scenarios, particularly within a terminal context. The model benefits from a substantial 32768 token context window, allowing it to handle complex and lengthy data-related prompts.

Key Capabilities

  • Specialized Data Querying: Fine-tuned for understanding and responding to data-related queries.
  • Large Context Window: Supports a 32768 token context length, beneficial for detailed data analysis and complex instructions.
  • Qwen3-8B Foundation: Built upon the robust Qwen3-8B base model, inheriting its general language understanding capabilities.

Training Details

The model was trained with a learning rate of 4e-05, a total batch size of 96 (achieved with train_batch_size: 1 and gradient_accumulation_steps: 3 across 32 GPUs), and utilized the AdamW optimizer. A cosine learning rate scheduler with a 0.1 warmup ratio was employed over 7 epochs. The training leveraged Transformers 4.57.6 and PyTorch 2.9.1+cu130.