Model Overview
This model, llama_2_cot_simplest_code_math_1_3_epoch_full, is a fine-tuned variant of the meta-llama/Llama-2-7b-chat-hf base model. Developed by CharlesLi, it features 7 billion parameters and was trained for 3 epochs.
Key Characteristics
- Base Model: Fine-tuned from Llama-2-7b-chat-hf, inheriting its foundational language understanding and generation capabilities.
- Training Data: Trained on a specific "generator dataset," indicating a focus on generating particular types of output.
- Performance Metric: Achieved a loss of 0.6809 on its evaluation set, demonstrating its learning efficacy during training.
Training Details
The model was trained using the following hyperparameters:
- Learning Rate: 2e-05
- Batch Size: 4 (train), 4 (eval)
- Gradient Accumulation: 2 steps, resulting in a total train batch size of 32.
- Optimizer: Adam with betas=(0.9, 0.999) and epsilon=1e-08.
- Scheduler: Cosine learning rate scheduler with a warmup ratio of 0.1.
- Epochs: 3 full training epochs.
Intended Use Cases
While specific intended uses and limitations are not detailed in the provided README, the model's fine-tuning on a "generator dataset" and its Llama 2 base suggest potential applications in tasks requiring text generation, conversational AI, and potentially specialized content creation based on the nature of the generator dataset.