mlfoundations-dev/teacher_science_qwq

TEXT GENERATIONConcurrency Cost:1Model Size:7.6BQuant:FP8Ctx Length:32kPublished:Apr 29, 2025License:apache-2.0Architecture:Transformer Open Weights Cold

The mlfoundations-dev/teacher_science_qwq model is a 7.6 billion parameter instruction-tuned language model, fine-tuned from Qwen/Qwen2.5-7B-Instruct. It was developed by mlfoundations-dev and trained with a 32768 token context length. This model is specifically fine-tuned on the mlfoundations-dev/teacher_science_qwq dataset, indicating a specialization in a particular domain, likely related to educational or scientific content.

Loading preview...

Overview

This model, teacher_science_qwq, is a 7.6 billion parameter language model developed by mlfoundations-dev. It is a fine-tuned iteration of the Qwen/Qwen2.5-7B-Instruct base model, indicating an instruction-following capability. The training process utilized a substantial context length of 32768 tokens, suggesting its potential for handling extensive inputs and generating coherent, long-form responses.

Training Details

The model was fine-tuned using specific hyperparameters, including a learning rate of 4e-05, a total training batch size of 128 (achieved with a train_batch_size of 1 and gradient_accumulation_steps of 2 across 64 GPUs), and 5 epochs. The optimizer used was AdamW with cosine learning rate scheduling and a warmup ratio of 0.1. This configuration suggests a robust training setup aimed at optimizing performance on its target dataset.

Intended Use

While specific intended uses and limitations are not detailed in the provided information, its fine-tuning on the mlfoundations-dev/teacher_science_qwq dataset implies a specialization. Developers should consider this model for applications requiring nuanced understanding or generation within the domain covered by this specific dataset, potentially related to educational content, scientific explanations, or question-answering in these fields.