Model Overview
CharlesLi/llama_2_cot_simplest_code_math_4_full is a 7 billion parameter language model derived from Meta's Llama-2-7b-chat-hf. This model has undergone fine-tuning with a focus on enhancing its performance in specific reasoning and mathematical tasks, as indicated by a training loss of 0.6062 on its evaluation set.
Key Characteristics
- Base Model: Fine-tuned from
meta-llama/Llama-2-7b-chat-hf. - Parameter Count: 7 billion parameters.
- Context Length: Supports a context window of 4096 tokens.
- Training Objective: Optimized for tasks requiring reasoning and mathematical problem-solving.
- Training Loss: Achieved a loss of 0.6062 on the evaluation set.
Training Details
The model was trained using the following hyperparameters:
- Learning Rate: 2e-05
- Batch Size: 4 (train), 4 (eval)
- Gradient Accumulation Steps: 2, leading to a total train batch size of 32.
- Optimizer: Adam with betas=(0.9, 0.999) and epsilon=1e-08.
- LR Scheduler: Cosine with a warmup ratio of 0.1 over 1 epoch.
Intended Use Cases
This model is suitable for applications where robust logical reasoning and accurate mathematical computations are critical. Its fine-tuning suggests potential strengths in areas such as:
- Solving mathematical word problems.
- Executing multi-step reasoning tasks.
- Code-related logical inference.