DeeWoo/Llama-2-7b-chat_FFT_GSM8K

TEXT GENERATIONConcurrency Cost:1Model Size:7BQuant:FP8Ctx Length:4kPublished:Dec 30, 2024License:otherArchitecture:Transformer0.0K Cold

DeeWoo/Llama-2-7b-chat_FFT_GSM8K is a 7 billion parameter Llama-2-chat model fine-tuned by DeeWoo. This model is specifically optimized for mathematical reasoning and problem-solving, having been trained on the GSM8K dataset. It leverages a 4096-token context length to enhance its performance on arithmetic and word problems. Its primary strength lies in accurately tackling grade-school level math tasks.

Loading preview...

Overview

DeeWoo/Llama-2-7b-chat_FFT_GSM8K is a 7 billion parameter language model, fine-tuned from Meta's Llama-2-7b-chat-hf. This model has undergone specialized training on the GSM8K dataset, which focuses on grade-school level mathematical word problems. The fine-tuning process aimed to enhance the model's capabilities in numerical reasoning and problem-solving.

Key Capabilities

  • Mathematical Reasoning: Optimized for solving arithmetic and word problems, particularly those found in the GSM8K dataset.
  • Llama-2 Foundation: Benefits from the robust architecture and general language understanding of the base Llama-2-7b-chat model.

Training Details

The model was trained using the following key hyperparameters:

  • Learning Rate: 1e-05
  • Batch Size: A total training batch size of 64 (16 per GPU across 4 devices).
  • Epochs: Trained for 3.0 epochs.
  • Optimizer: Adam with betas=(0.9, 0.999) and epsilon=1e-08.
  • Mixed Precision: Utilized Native AMP for efficient training.

Good For

  • Applications requiring accurate solutions to mathematical problems.
  • Research into fine-tuning large language models for specific reasoning tasks.
  • Educational tools focused on math assistance.