CharlesLi/llama_2_cot_simplest_code_math_0_full

TEXT GENERATIONConcurrency Cost:1Model Size:7BQuant:FP8Ctx Length:4kPublished:Jan 20, 2025License:llama2Architecture:Transformer Open Weights Cold

The CharlesLi/llama_2_cot_simplest_code_math_0_full model is a 7 billion parameter Llama-2-7b-chat-hf variant, fine-tuned by CharlesLi. This model is specifically adapted from the base Llama 2 architecture for tasks related to code and mathematics, as indicated by its name and fine-tuning on a generator dataset. It aims to enhance performance in these specialized domains, offering a focused alternative to general-purpose large language models.

Loading preview...

Overview

This model, llama_2_cot_simplest_code_math_0_full, is a fine-tuned version of the meta-llama/Llama-2-7b-chat-hf base model, developed by CharlesLi. It has 7 billion parameters and was trained with a context length of 4096 tokens. The fine-tuning process utilized a specific "generator dataset," suggesting an optimization for generating content, likely within the domains of code and mathematics as implied by the model's naming convention.

Training Details

The model underwent training with a learning rate of 2e-05, a batch size of 4 (total effective batch size of 32 with gradient accumulation), and for 1 epoch. It used an Adam optimizer and a cosine learning rate scheduler with a 0.1 warmup ratio. The training was conducted on a multi-GPU setup with 4 devices. The reported loss on the evaluation set was 0.8119.

Key Characteristics

  • Base Model: Llama-2-7b-chat-hf
  • Parameter Count: 7 billion
  • Context Length: 4096 tokens
  • Fine-tuning Focus: Generator dataset, likely for code and mathematical tasks.

Intended Use

While specific intended uses and limitations are not detailed in the provided README, its fine-tuning on a generator dataset and naming suggest it is designed for tasks requiring generation in technical or analytical contexts, such as code snippets, mathematical problem-solving, or logical reasoning. Developers should consider its specialized fine-tuning for applications where enhanced performance in these areas is critical.