CharlesLi/llama_2_llama_2_code_math_1_full

TEXT GENERATIONConcurrency Cost:1Model Size:7BQuant:FP8Ctx Length:4kPublished:Jan 19, 2025License:llama2Architecture:Transformer Open Weights Cold

The CharlesLi/llama_2_llama_2_code_math_1_full model is a 7 billion parameter language model, fine-tuned from Meta's Llama-2-7b-chat-hf. It is optimized for specific tasks based on its training dataset, achieving a loss of 0.8356 on its evaluation set. This model is intended for applications requiring a Llama 2-based architecture with its particular fine-tuning focus.

Loading preview...

Model Overview

CharlesLi/llama_2_llama_2_code_math_1_full is a 7 billion parameter language model derived from the meta-llama/Llama-2-7b-chat-hf base model. It has undergone fine-tuning on a specific generator dataset, achieving a reported loss of 0.8356 on its evaluation set.

Training Details

The model was trained using the following key hyperparameters:

  • Learning Rate: 2e-05
  • Batch Sizes: train_batch_size of 4, eval_batch_size of 4
  • Gradient Accumulation: 2 steps, leading to a total_train_batch_size of 32
  • Optimizer: Adam with betas=(0.9, 0.999) and epsilon=1e-08
  • LR Scheduler: Cosine type with a warmup ratio of 0.1
  • Epochs: 1

The training utilized a multi-GPU setup with 4 devices. The framework versions included Transformers 4.44.2, Pytorch 2.4.1+cu121, Datasets 3.0.0, and Tokenizers 0.19.1.

Intended Use

While specific intended uses and limitations require more detailed information, this model is generally suitable for tasks aligned with its Llama 2 foundation and the characteristics of its fine-tuning dataset. Developers should consider its base architecture and training specifics when evaluating its applicability for their particular use cases.