luckeciano/Llama-3.1-8B-Instruct-GRPO-Base-v2_1346

Hugging Face
TEXT GENERATIONConcurrency Cost:1Model Size:8BQuant:FP8Ctx Length:32kPublished:Sep 21, 2025Architecture:Transformer Warm

The luckeciano/Llama-3.1-8B-Instruct-GRPO-Base-v2_1346 is an 8 billion parameter instruction-tuned causal language model, fine-tuned from Meta's Llama-3.1-8B-Instruct. It was trained using the GRPO method on the MATH-lighteval dataset, specializing it for mathematical reasoning tasks. With a 32K context length, this model is optimized to enhance performance in complex mathematical problem-solving.

Loading preview...

Model Overview

This model, luckeciano/Llama-3.1-8B-Instruct-GRPO-Base-v2_1346, is an 8 billion parameter instruction-tuned language model. It is a fine-tuned version of the meta-llama/Llama-3.1-8B-Instruct base model, leveraging its robust architecture and a 32,768 token context length.

Key Capabilities

  • Enhanced Mathematical Reasoning: The model has been specifically fine-tuned on the DigitalLearningGmbH/MATH-lighteval dataset.
  • GRPO Training Method: It utilizes the GRPO (Generalized Reinforcement Learning with Policy Optimization) training method, as introduced in the paper "DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models" (arXiv:2402.03300). This method is designed to improve mathematical problem-solving abilities.
  • Instruction Following: As an instruction-tuned model, it is designed to follow user prompts effectively.

Good For

  • Mathematical Problem Solving: Ideal for applications requiring strong mathematical reasoning and accurate numerical computations.
  • Research in RLHF/Fine-tuning: Provides a practical example of GRPO application for researchers exploring advanced fine-tuning techniques.
  • Educational Tools: Can be integrated into tools for learning or practicing mathematics.