CharlesLi/llama_2_cot_simplest_code_math_4_full

TEXT GENERATIONConcurrency Cost:1Model Size:7BQuant:FP8Ctx Length:4kPublished:Jan 20, 2025License:llama2Architecture:Transformer Open Weights Cold

The CharlesLi/llama_2_cot_simplest_code_math_4_full is a 7 billion parameter language model, fine-tuned from Meta's Llama-2-7b-chat-hf. This model is specifically optimized for reasoning and mathematical tasks, demonstrating a training loss of 0.6062. It is designed for applications requiring robust logical inference and numerical problem-solving capabilities within a 4096-token context window.

Loading preview...

Model Overview

CharlesLi/llama_2_cot_simplest_code_math_4_full is a 7 billion parameter language model derived from Meta's Llama-2-7b-chat-hf. This model has undergone fine-tuning with a focus on enhancing its performance in specific reasoning and mathematical tasks, as indicated by a training loss of 0.6062 on its evaluation set.

Key Characteristics

  • Base Model: Fine-tuned from meta-llama/Llama-2-7b-chat-hf.
  • Parameter Count: 7 billion parameters.
  • Context Length: Supports a context window of 4096 tokens.
  • Training Objective: Optimized for tasks requiring reasoning and mathematical problem-solving.
  • Training Loss: Achieved a loss of 0.6062 on the evaluation set.

Training Details

The model was trained using the following hyperparameters:

  • Learning Rate: 2e-05
  • Batch Size: 4 (train), 4 (eval)
  • Gradient Accumulation Steps: 2, leading to a total train batch size of 32.
  • Optimizer: Adam with betas=(0.9, 0.999) and epsilon=1e-08.
  • LR Scheduler: Cosine with a warmup ratio of 0.1 over 1 epoch.

Intended Use Cases

This model is suitable for applications where robust logical reasoning and accurate mathematical computations are critical. Its fine-tuning suggests potential strengths in areas such as:

  • Solving mathematical word problems.
  • Executing multi-step reasoning tasks.
  • Code-related logical inference.