mlfoundations-dev/teacher_code_qwq

TEXT GENERATIONConcurrency Cost:1Model Size:7.6BQuant:FP8Ctx Length:32kPublished:Apr 28, 2025License:apache-2.0Architecture:Transformer Open Weights Cold

The mlfoundations-dev/teacher_code_qwq model is a 7.6 billion parameter instruction-tuned causal language model, fine-tuned from Qwen/Qwen2.5-7B-Instruct. Developed by mlfoundations-dev, this model is specialized for code-related tasks, leveraging its base architecture for enhanced performance in programming contexts. Its training on a specific dataset aims to optimize its utility for code generation, analysis, and understanding.

Loading preview...

Overview

This model, mlfoundations-dev/teacher_code_qwq, is a 7.6 billion parameter language model fine-tuned from the Qwen/Qwen2.5-7B-Instruct base. It was developed by mlfoundations-dev with a focus on code-related applications.

Training Details

The model underwent fine-tuning using the mlfoundations-dev/teacher_code_qwq dataset. Key training hyperparameters included a learning rate of 4e-05, a total batch size of 128 (with 64 devices and 2 gradient accumulation steps), and 5 epochs. The optimizer used was AdamW with specific beta and epsilon values, and a cosine learning rate scheduler with a 0.1 warmup ratio.

Intended Use

While specific intended uses and limitations require more information, the fine-tuning on a code-centric dataset suggests its primary application is in code generation, comprehension, or related programming tasks. Developers seeking a specialized model for such applications, building upon the Qwen2.5-7B-Instruct architecture, may find this model suitable.