fzzhang/Marcoroni-neural-chat-7B-v2_gsm8k_merged
The fzzhang/Marcoroni-neural-chat-7B-v2_gsm8k_merged is a 7 billion parameter language model, fine-tuned from Toten5/Marcoroni-neural-chat-7B-v2, specifically optimized for mathematical reasoning tasks. This model leverages a 4096-token context window and is specialized in solving problems from the GSM8K dataset. Its primary differentiator is its enhanced capability in arithmetic and common sense reasoning, making it suitable for applications requiring robust numerical problem-solving.
Loading preview...
Model Overview
The fzzhang/Marcoroni-neural-chat-7B-v2_gsm8k_merged is a 7 billion parameter language model, building upon the Toten5/Marcoroni-neural-chat-7B-v2 base. This iteration has been specifically fine-tuned on the GSM8K dataset, indicating a strong focus on mathematical reasoning and problem-solving capabilities. The model was trained with a learning rate of 1e-05 over 5 epochs, utilizing Adam optimizer and a linear learning rate scheduler.
Key Capabilities
- Mathematical Reasoning: Specialized in arithmetic and common sense reasoning, as evidenced by its fine-tuning on the GSM8K dataset.
- Base Model Enhancement: Improves upon the
Marcoroni-neural-chat-7B-v2by adding domain-specific expertise.
Training Details
The fine-tuning process involved:
- Learning Rate: 1e-05
- Batch Sizes:
train_batch_sizeof 4,eval_batch_sizeof 8 - Optimizer: Adam with betas=(0.9, 0.999) and epsilon=1e-08
- Epochs: 5
- Frameworks: PEFT 0.7.2.dev0, Transformers 4.36.2, Pytorch 2.1.2, Datasets 2.16.1, Tokenizers 0.15.1
Good For
- Applications requiring strong mathematical problem-solving.
- Tasks involving common sense reasoning with numerical data.
- Developers looking for a 7B model with enhanced arithmetic capabilities.