mlfoundations-dev/qwen2-5_openthoughts_2-5k_rewrite_r1_distill_llama70b_16k

TEXT GENERATIONConcurrency Cost:1Model Size:7.6BQuant:FP8Ctx Length:32kPublished:Feb 24, 2025License:apache-2.0Architecture:Transformer Open Weights Cold

The mlfoundations-dev/qwen2-5_openthoughts_2-5k_rewrite_r1_distill_llama70b_16k is a 7.6 billion parameter language model, fine-tuned from Qwen/Qwen2.5-7B-Instruct. This model is specifically adapted using the mlfoundations-dev/openthoughts_2-5k_rewrite_r1_distill_llama70b_16k dataset, suggesting a specialization in processing or generating content related to open thoughts or distilled information from Llama 70B. It is designed for tasks benefiting from its base Qwen2.5 architecture and its specific fine-tuning data.

Loading preview...

Overview

This model, mlfoundations-dev/qwen2-5_openthoughts_2-5k_rewrite_r1_distill_llama70b_16k, is a 7.6 billion parameter language model. It is a fine-tuned variant of the Qwen/Qwen2.5-7B-Instruct base model, indicating its foundation in the Qwen2.5 architecture and its instruction-following capabilities.

Key Characteristics

  • Base Model: Qwen/Qwen2.5-7B-Instruct.
  • Fine-tuning Dataset: Specifically trained on the mlfoundations-dev/openthoughts_2-5k_rewrite_r1_distill_llama70b_16k dataset.
  • Parameter Count: 7.6 billion parameters.
  • Context Length: The model supports a context length of 131,072 tokens, allowing for processing and generating very long sequences.

Training Details

The model was trained with a learning rate of 1e-05, a batch size of 1 (with 3 gradient accumulation steps across 32 devices, totaling an effective batch size of 96), and for 3 epochs. The optimizer used was ADAMW_TORCH with cosine learning rate scheduling and a 0.1 warmup ratio. This fine-tuning process aims to adapt the base Qwen2.5 model to the specific characteristics and patterns present in the openthoughts_2-5k_rewrite_r1_distill_llama70b_16k dataset.