This is an 8 billion parameter instruction-tuned language model, fine-tuned by sleeepeer from Meta Llama 3.1. It leverages the GRPO method, introduced in DeepSeekMath, to enhance mathematical reasoning capabilities. The model is specifically trained for improved performance in complex reasoning tasks, making it suitable for applications requiring advanced problem-solving.
No reviews yet. Be the first to review!