Overview
This model, laion/nemotron-100000-opt100k__Qwen3-8B, is an 8 billion parameter language model built upon the Qwen3-8B architecture. It has undergone fine-tuning using the laion/nemotron-terminal-corpus-unified-100000 dataset, indicating a specialization derived from this particular training data. The model supports a substantial context length of 32,768 tokens, allowing it to process and generate longer sequences of text.
Key Characteristics
- Base Model: Qwen3-8B, a robust foundation for general language tasks.
- Parameter Count: 8 billion parameters, offering a balance between performance and computational efficiency.
- Context Length: 32K tokens, enabling the model to handle extensive inputs and maintain coherence over long conversations or documents.
- Fine-tuning Dataset: Trained on the
laion/nemotron-terminal-corpus-unified-100000 dataset, suggesting potential strengths related to the characteristics of this specific corpus.
Training Details
The model was trained with a learning rate of 4e-05, a total batch size of 96 (across 32 GPUs with 3 gradient accumulation steps), and utilized the AdamW optimizer with a cosine learning rate scheduler over 5 epochs. These hyperparameters indicate a thorough and optimized training regimen aimed at enhancing the model's capabilities on its fine-tuning data.