Model Overview
CharlesLi/llama_2_llama_2_code_math_4_full is a 7 billion parameter language model, fine-tuned from the meta-llama/Llama-2-7b-chat-hf base model. Developed by CharlesLi, this iteration focuses on improving performance in specific domains, particularly those involving code and mathematical reasoning. The model was trained on a generator dataset, achieving a reported loss of 0.6615 on its evaluation set.
Key Training Details
The fine-tuning process utilized several specific hyperparameters:
- Learning Rate: 2e-05
- Optimizer: Adam with betas=(0.9, 0.999) and epsilon=1e-08
- Batch Sizes:
train_batch_size of 4, eval_batch_size of 4, leading to a total_train_batch_size of 32 and total_eval_batch_size of 16 (with 4 devices and 2 gradient accumulation steps). - Scheduler: Cosine learning rate scheduler with a warmup ratio of 0.1.
- Epochs: Trained for 1 epoch.
Intended Use Cases
While specific intended uses and limitations require further information from the developer, the fine-tuning on a "generator dataset" and the model's name suggest an orientation towards tasks that benefit from enhanced generation capabilities, particularly in technical or analytical contexts like code generation or mathematical problem-solving. Developers should consider this model for applications where a Llama 2 base with specialized reasoning improvements is beneficial.