CharlesLi/llama_2_llama_2_code_math_5_full

TEXT GENERATIONConcurrency Cost:1Model Size:7BQuant:FP8Ctx Length:4kPublished:Jan 19, 2025License:llama2Architecture:Transformer Open Weights Cold

The CharlesLi/llama_2_llama_2_code_math_5_full model is a 7 billion parameter language model fine-tuned from Meta's Llama-2-7b-chat-hf. It is optimized for specific tasks, as indicated by its fine-tuning on a generator dataset, achieving a validation loss of 0.5808. This model is intended for applications requiring specialized performance derived from its Llama 2 base.

Loading preview...

Model Overview

This model, llama_2_llama_2_code_math_5_full, is a fine-tuned variant of the Meta Llama-2-7b-chat-hf architecture. It has 7 billion parameters and was trained on a specific "generator dataset" to adapt its capabilities. The fine-tuning process resulted in a validation loss of 0.5808, indicating its performance on the evaluation set.

Training Details

The model was trained using the following key hyperparameters:

  • Learning Rate: 2e-05
  • Batch Size: 32 (total train batch size)
  • Optimizer: Adam with default betas and epsilon
  • LR Scheduler: Cosine with 0.1 warmup ratio
  • Epochs: 1

Intended Use

While specific intended uses and limitations are not detailed in the provided information, its fine-tuning from a chat-optimized Llama 2 base suggests potential applications in conversational AI or tasks related to the nature of its "generator dataset." Users should conduct further evaluation to determine suitability for specific use cases.