CharlesLi/llama_2_llama_2_code_math_4_full

TEXT GENERATIONConcurrency Cost:1Model Size:7BQuant:FP8Ctx Length:4kPublished:Jan 19, 2025License:llama2Architecture:Transformer Open Weights Cold

The CharlesLi/llama_2_llama_2_code_math_4_full model is a 7 billion parameter Llama-2-7b-chat-hf variant, fine-tuned by CharlesLi. This model is specifically optimized for tasks requiring code and mathematical reasoning, building upon the foundational capabilities of the Llama 2 architecture. It is designed to enhance performance in specialized domains where precise logical and computational understanding is critical.

Loading preview...

Model Overview

CharlesLi/llama_2_llama_2_code_math_4_full is a 7 billion parameter language model, fine-tuned from the meta-llama/Llama-2-7b-chat-hf base model. Developed by CharlesLi, this iteration focuses on improving performance in specific domains, particularly those involving code and mathematical reasoning. The model was trained on a generator dataset, achieving a reported loss of 0.6615 on its evaluation set.

Key Training Details

The fine-tuning process utilized several specific hyperparameters:

  • Learning Rate: 2e-05
  • Optimizer: Adam with betas=(0.9, 0.999) and epsilon=1e-08
  • Batch Sizes: train_batch_size of 4, eval_batch_size of 4, leading to a total_train_batch_size of 32 and total_eval_batch_size of 16 (with 4 devices and 2 gradient accumulation steps).
  • Scheduler: Cosine learning rate scheduler with a warmup ratio of 0.1.
  • Epochs: Trained for 1 epoch.

Intended Use Cases

While specific intended uses and limitations require further information from the developer, the fine-tuning on a "generator dataset" and the model's name suggest an orientation towards tasks that benefit from enhanced generation capabilities, particularly in technical or analytical contexts like code generation or mathematical problem-solving. Developers should consider this model for applications where a Llama 2 base with specialized reasoning improvements is beneficial.