sleeepeer/meta-llama-Llama-3.1-8B-Instruct-pisanitizer-squad_v2-sanitization-42-202601082138

TEXT GENERATIONConcurrency Cost:1Model Size:8BQuant:FP8Ctx Length:32kPublished:Jan 9, 2026Architecture:Transformer Cold

The sleeepeer/meta-llama-Llama-3.1-8B-Instruct-pisanitizer-squad_v2-sanitization-42-202601082138 is an 8 billion parameter instruction-tuned causal language model, fine-tuned from Meta Llama 3.1. It was trained using the GRPO method, which is designed to enhance mathematical reasoning capabilities. This model is optimized for tasks requiring robust reasoning, particularly in mathematical contexts, and supports a context length of 32768 tokens.

Loading preview...

Model Overview

This model, sleeepeer/meta-llama-Llama-3.1-8B-Instruct-pisanitizer-squad_v2-sanitization-42-202601082138, is an 8 billion parameter instruction-tuned variant of the Meta Llama 3.1-8B-Instruct base model. It has been specifically fine-tuned using the TRL library.

Key Differentiator: GRPO Training

A significant aspect of this model's development is its training procedure, which incorporates the GRPO (Gradient-based Reward Policy Optimization) method. GRPO, introduced in the research paper "DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models," aims to enhance the model's capabilities in mathematical reasoning tasks. This suggests an optimization for more robust and accurate problem-solving in quantitative domains.

Capabilities & Use Cases

  • Enhanced Mathematical Reasoning: The application of the GRPO training method indicates a focus on improving the model's ability to understand and solve complex mathematical problems.
  • Instruction Following: As an instruction-tuned model, it is designed to follow user prompts and generate relevant responses effectively.
  • Long Context Understanding: With a context length of 32768 tokens, it can process and generate text based on extensive input, making it suitable for tasks requiring detailed contextual awareness.

Training Frameworks

The model was trained using TRL (Transformer Reinforcement Learning) version 0.26.2, with Transformers 4.56.2, Pytorch 2.9.0, Datasets 4.4.2, and Tokenizers 0.22.1.