Overview
This model, teacher_science_qwq, is a 7.6 billion parameter language model developed by mlfoundations-dev. It is a fine-tuned iteration of the Qwen/Qwen2.5-7B-Instruct base model, indicating an instruction-following capability. The training process utilized a substantial context length of 32768 tokens, suggesting its potential for handling extensive inputs and generating coherent, long-form responses.
Training Details
The model was fine-tuned using specific hyperparameters, including a learning rate of 4e-05, a total training batch size of 128 (achieved with a train_batch_size of 1 and gradient_accumulation_steps of 2 across 64 GPUs), and 5 epochs. The optimizer used was AdamW with cosine learning rate scheduling and a warmup ratio of 0.1. This configuration suggests a robust training setup aimed at optimizing performance on its target dataset.
Intended Use
While specific intended uses and limitations are not detailed in the provided information, its fine-tuning on the mlfoundations-dev/teacher_science_qwq dataset implies a specialization. Developers should consider this model for applications requiring nuanced understanding or generation within the domain covered by this specific dataset, potentially related to educational content, scientific explanations, or question-answering in these fields.