axel-datos/Llama-3.2-1B_gsm8k_full-finetuning

Hugging Face
TEXT GENERATIONConcurrency Cost:1Model Size:1BQuant:BF16Ctx Length:32kPublished:Dec 13, 2024License:llama3.2Architecture:Transformer Warm

axel-datos/Llama-3.2-1B_gsm8k_full-finetuning is a fine-tuned version of Meta Llama-3.2-1B, specifically adapted from the meta-llama/Llama-3.2-1B base model. This model has undergone full finetuning on a customized dataset, indicating a specialization for particular tasks. While specific performance metrics are not detailed, its fine-tuning suggests an optimization for a focused application rather than general-purpose language generation.

Loading preview...

Overview

This model, axel-datos/Llama-3.2-1B_gsm8k_full-finetuning, is a specialized variant derived from the Meta Llama-3.2-1B architecture. It has been subjected to a full finetuning process using a customized dataset, aiming to adapt its capabilities for specific applications beyond its base model's general-purpose nature.

Training Details

The finetuning procedure utilized the following key hyperparameters:

  • Learning Rate: 2e-05
  • Optimizer: AdamW with betas=(0.9, 0.999) and epsilon=1e-08
  • Batch Sizes: train_batch_size of 1, eval_batch_size of 8
  • Epochs: 0.01 (indicating a very short, targeted finetuning run)
  • Mixed Precision: Native AMP was employed for training efficiency.

Framework Versions

The model was trained using:

  • Transformers 4.46.3
  • Pytorch 2.5.1+cu121
  • Datasets 3.2.0
  • Tokenizers 0.20.3

Intended Use

While specific intended uses and limitations are not detailed in the provided information, the full finetuning on a customized dataset suggests that this model is optimized for tasks related to the nature of that specific dataset. Users should evaluate its performance on their target tasks, especially those requiring specialized knowledge or reasoning capabilities that might be enhanced by the finetuning data.