CharlesLi/llama_2_llama_2_code_math_3_full

TEXT GENERATIONConcurrency Cost:1Model Size:7BQuant:FP8Ctx Length:4kPublished:Jan 19, 2025License:llama2Architecture:Transformer Open Weights Cold

The CharlesLi/llama_2_llama_2_code_math_3_full is a 7 billion parameter language model, fine-tuned from Meta's Llama-2-7b-chat-hf. This model is specifically fine-tuned on a generator dataset, indicating an optimization for content generation tasks. It is designed for applications requiring a Llama-2-based model with enhanced generative capabilities, as suggested by its training on a 'generator dataset'.

Loading preview...

Model Overview

The CharlesLi/llama_2_llama_2_code_math_3_full is a 7 billion parameter language model, fine-tuned from the meta-llama/Llama-2-7b-chat-hf base model. This fine-tuning process utilized a specific "generator dataset," suggesting an optimization for tasks involving content creation or generation.

Key Training Details

  • Base Model: meta-llama/Llama-2-7b-chat-hf
  • Parameter Count: 7 billion
  • Training Objective: Fine-tuned on a "generator dataset."
  • Observed Loss: Achieved a loss of 0.5628 on the evaluation set.
  • Hyperparameters:
    • Learning Rate: 2e-05
    • Optimizer: Adam with betas=(0.9, 0.999) and epsilon=1e-08
    • Epochs: 1
    • Batch Size: 4 (train/eval), total 32 (train), 16 (eval)

Intended Use Cases

While specific intended uses are not detailed, the fine-tuning on a "generator dataset" implies suitability for tasks where the model needs to produce coherent and relevant text, potentially for creative writing, summarization, or other generative applications. Users should consider its Llama-2 lineage for general language understanding and generation tasks, with an emphasis on its fine-tuned generative capabilities.