luckeciano/Llama-3.1-8B-Instruct-GRPO-Base-v2_1346
TEXT GENERATIONConcurrency Cost:1Model Size:8BQuant:FP8Ctx Length:32kPublished:Sep 21, 2025Architecture:Transformer Warm
The luckeciano/Llama-3.1-8B-Instruct-GRPO-Base-v2_1346 is an 8 billion parameter instruction-tuned causal language model, fine-tuned from Meta's Llama-3.1-8B-Instruct. It was trained using the GRPO method on the MATH-lighteval dataset, specializing it for mathematical reasoning tasks. With a 32K context length, this model is optimized to enhance performance in complex mathematical problem-solving.
Loading preview...
Model Overview
This model, luckeciano/Llama-3.1-8B-Instruct-GRPO-Base-v2_1346, is an 8 billion parameter instruction-tuned language model. It is a fine-tuned version of the meta-llama/Llama-3.1-8B-Instruct base model, leveraging its robust architecture and a 32,768 token context length.
Key Capabilities
- Enhanced Mathematical Reasoning: The model has been specifically fine-tuned on the DigitalLearningGmbH/MATH-lighteval dataset.
- GRPO Training Method: It utilizes the GRPO (Generalized Reinforcement Learning with Policy Optimization) training method, as introduced in the paper "DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models" (arXiv:2402.03300). This method is designed to improve mathematical problem-solving abilities.
- Instruction Following: As an instruction-tuned model, it is designed to follow user prompts effectively.
Good For
- Mathematical Problem Solving: Ideal for applications requiring strong mathematical reasoning and accurate numerical computations.
- Research in RLHF/Fine-tuning: Provides a practical example of GRPO application for researchers exploring advanced fine-tuning techniques.
- Educational Tools: Can be integrated into tools for learning or practicing mathematics.