Overview
This model, llama_2_cot_simplest_code_math_0_full, is a fine-tuned version of the meta-llama/Llama-2-7b-chat-hf base model, developed by CharlesLi. It has 7 billion parameters and was trained with a context length of 4096 tokens. The fine-tuning process utilized a specific "generator dataset," suggesting an optimization for generating content, likely within the domains of code and mathematics as implied by the model's naming convention.
Training Details
The model underwent training with a learning rate of 2e-05, a batch size of 4 (total effective batch size of 32 with gradient accumulation), and for 1 epoch. It used an Adam optimizer and a cosine learning rate scheduler with a 0.1 warmup ratio. The training was conducted on a multi-GPU setup with 4 devices. The reported loss on the evaluation set was 0.8119.
Key Characteristics
- Base Model: Llama-2-7b-chat-hf
- Parameter Count: 7 billion
- Context Length: 4096 tokens
- Fine-tuning Focus: Generator dataset, likely for code and mathematical tasks.
Intended Use
While specific intended uses and limitations are not detailed in the provided README, its fine-tuning on a generator dataset and naming suggest it is designed for tasks requiring generation in technical or analytical contexts, such as code snippets, mathematical problem-solving, or logical reasoning. Developers should consider its specialized fine-tuning for applications where enhanced performance in these areas is critical.