axel-datos/qwen2.5-0.5b-instruct_gsm8k_full-finetuningV2

Hugging Face
TEXT GENERATIONConcurrency Cost:1Model Size:0.5BQuant:BF16Ctx Length:32kPublished:Dec 15, 2024License:apache-2.0Architecture:Transformer Open Weights Warm

axel-datos/qwen2.5-0.5b-instruct_gsm8k_full-finetuningV2 is a fine-tuned version of the Qwen/qwen2.5-0.5b-instruct model. This model has been adapted using a customized dataset, suggesting a specialization beyond its base instruction-following capabilities. It is intended for tasks aligned with its specific fine-tuning, though further details on its exact capabilities and limitations are not provided.

Loading preview...

Model Overview

This model, axel-datos/qwen2.5-0.5b-instruct_gsm8k_full-finetuningV2, is a fine-tuned variant of the Qwen/qwen2.5-0.5b-instruct base model. It has undergone further training on a customized dataset, indicating an optimization for specific tasks or domains not covered by the original instruction-tuned model.

Training Details

The fine-tuning process utilized the following key hyperparameters:

  • Learning Rate: 2e-05
  • Batch Sizes: train_batch_size of 1, eval_batch_size of 8
  • Optimizer: AdamW with betas=(0.9, 0.999) and epsilon=1e-08
  • Scheduler: Linear learning rate scheduler
  • Epochs: Trained for 0.01 epochs
  • Precision: Native AMP for mixed-precision training

Framework Versions

The training environment included:

  • Transformers 4.46.3
  • PyTorch 2.5.1+cu121
  • Datasets 3.2.0
  • Tokenizers 0.20.3

Intended Use

While specific intended uses and limitations are not detailed, the model's fine-tuning on a custom dataset suggests it is tailored for particular applications. Users should evaluate its performance on their specific tasks, especially those related to the undisclosed custom dataset.