mlfoundations-dev/qwen2-5_sky_t1_2-5k_base

Warm
Public
7.6B
FP8
32768
Feb 11, 2025
License: apache-2.0
Hugging Face
Overview

Overview

The mlfoundations-dev/qwen2-5_sky_t1_2-5k_base model is a specialized fine-tuned variant of the Qwen2.5-7B-Instruct model, originally developed by Qwen. This iteration has been specifically adapted by mlfoundations-dev through training on the mlfoundations-dev/sky_t1_2-5k_base dataset.

Key Capabilities

  • Foundation Model Adaptation: Leverages the robust architecture and pre-training of the Qwen2.5-7B-Instruct base model.
  • Specialized Fine-tuning: Tailored through fine-tuning on a specific dataset, suggesting potential optimization for tasks related to the sky_t1_2-5k_base domain.

Training Details

The model was trained with the following key hyperparameters:

  • Learning Rate: 1e-05
  • Optimizer: ADAMW_TORCH with betas=(0.9, 0.999) and epsilon=1e-08
  • Batch Size: A total training batch size of 96 (train_batch_size: 1, gradient_accumulation_steps: 3, num_devices: 32)
  • Epochs: 3.0
  • Scheduler: Cosine learning rate scheduler with a warmup ratio of 0.1

Good for

  • Domain-Specific Applications: Potentially well-suited for tasks aligned with the sky_t1_2-5k_base dataset it was fine-tuned on.
  • Further Research and Development: Serves as a strong base for additional fine-tuning or experimentation within its specialized domain.