CharlesLi/llama_2_cot_simplest_code_math_1_3_epoch_full

TEXT GENERATIONConcurrency Cost:1Model Size:7BQuant:FP8Ctx Length:4kPublished:Jan 21, 2025License:llama2Architecture:Transformer Open Weights Cold

The CharlesLi/llama_2_cot_simplest_code_math_1_3_epoch_full is a 7 billion parameter Llama-2-7b-chat-hf model fine-tuned by CharlesLi. This model is specifically trained on a generator dataset, achieving a loss of 0.6809 on its evaluation set. It is intended for tasks related to code and mathematics, leveraging its Llama 2 base for conversational and reasoning capabilities.

Loading preview...

Model Overview

This model, llama_2_cot_simplest_code_math_1_3_epoch_full, is a fine-tuned variant of the meta-llama/Llama-2-7b-chat-hf base model. Developed by CharlesLi, it features 7 billion parameters and was trained for 3 epochs.

Key Characteristics

  • Base Model: Fine-tuned from Llama-2-7b-chat-hf, inheriting its foundational language understanding and generation capabilities.
  • Training Data: Trained on a specific "generator dataset," indicating a focus on generating particular types of output.
  • Performance Metric: Achieved a loss of 0.6809 on its evaluation set, demonstrating its learning efficacy during training.

Training Details

The model was trained using the following hyperparameters:

  • Learning Rate: 2e-05
  • Batch Size: 4 (train), 4 (eval)
  • Gradient Accumulation: 2 steps, resulting in a total train batch size of 32.
  • Optimizer: Adam with betas=(0.9, 0.999) and epsilon=1e-08.
  • Scheduler: Cosine learning rate scheduler with a warmup ratio of 0.1.
  • Epochs: 3 full training epochs.

Intended Use Cases

While specific intended uses and limitations are not detailed in the provided README, the model's fine-tuning on a "generator dataset" and its Llama 2 base suggest potential applications in tasks requiring text generation, conversational AI, and potentially specialized content creation based on the nature of the generator dataset.