fzzhang/Marcoroni-neural-chat-7B-v2_gsm8k_merged

TEXT GENERATIONConcurrency Cost:1Model Size:7BQuant:FP8Ctx Length:4kPublished:Feb 16, 2024License:apache-2.0Architecture:Transformer0.0K Open Weights Cold

The fzzhang/Marcoroni-neural-chat-7B-v2_gsm8k_merged is a 7 billion parameter language model, fine-tuned from Toten5/Marcoroni-neural-chat-7B-v2, specifically optimized for mathematical reasoning tasks. This model leverages a 4096-token context window and is specialized in solving problems from the GSM8K dataset. Its primary differentiator is its enhanced capability in arithmetic and common sense reasoning, making it suitable for applications requiring robust numerical problem-solving.

Loading preview...

Model Overview

The fzzhang/Marcoroni-neural-chat-7B-v2_gsm8k_merged is a 7 billion parameter language model, building upon the Toten5/Marcoroni-neural-chat-7B-v2 base. This iteration has been specifically fine-tuned on the GSM8K dataset, indicating a strong focus on mathematical reasoning and problem-solving capabilities. The model was trained with a learning rate of 1e-05 over 5 epochs, utilizing Adam optimizer and a linear learning rate scheduler.

Key Capabilities

  • Mathematical Reasoning: Specialized in arithmetic and common sense reasoning, as evidenced by its fine-tuning on the GSM8K dataset.
  • Base Model Enhancement: Improves upon the Marcoroni-neural-chat-7B-v2 by adding domain-specific expertise.

Training Details

The fine-tuning process involved:

  • Learning Rate: 1e-05
  • Batch Sizes: train_batch_size of 4, eval_batch_size of 8
  • Optimizer: Adam with betas=(0.9, 0.999) and epsilon=1e-08
  • Epochs: 5
  • Frameworks: PEFT 0.7.2.dev0, Transformers 4.36.2, Pytorch 2.1.2, Datasets 2.16.1, Tokenizers 0.15.1

Good For

  • Applications requiring strong mathematical problem-solving.
  • Tasks involving common sense reasoning with numerical data.
  • Developers looking for a 7B model with enhanced arithmetic capabilities.