jordanpainter/llama_gspo_200

TEXT GENERATIONConcurrency Cost:1Model Size:8BQuant:FP8Ctx Length:32kPublished:Mar 25, 2026Architecture:Transformer Cold

The jordanpainter/llama_gspo_200 is an 8 billion parameter language model fine-tuned from srirag/sft-llama-all. It was trained using the GRPO method, as introduced in the DeepSeekMath paper, which focuses on pushing the limits of mathematical reasoning. This model is optimized for enhanced reasoning capabilities, particularly in areas related to mathematical problem-solving and complex logical tasks.

Loading preview...

Overview

The jordanpainter/llama_gspo_200 is an 8 billion parameter language model that has been fine-tuned from the srirag/sft-llama-all base model. Its training leveraged the TRL library and specifically incorporated the GRPO (Gradient-based Reward Policy Optimization) method.

Key Capabilities

  • Enhanced Reasoning: The model's training with GRPO, a method detailed in the DeepSeekMath paper, suggests a focus on improving complex reasoning abilities.
  • Mathematical Problem Solving: Given the origin of the GRPO method in a paper dedicated to mathematical reasoning, this model is likely to exhibit stronger performance in tasks requiring logical and mathematical thought processes.
  • Fine-tuned Performance: As a fine-tuned variant, it aims to build upon the foundational capabilities of its base model, srirag/sft-llama-all, with specialized improvements.

Training Details

The model was trained using the TRL framework, with specific versions of libraries including TRL 0.28.0, Transformers 4.57.6, and Pytorch 2.5.1+cu121. The training process can be visualized via Weights & Biases.

Good For

  • Applications requiring advanced logical and mathematical reasoning.
  • Tasks where robust problem-solving capabilities are crucial.
  • Developers looking for a specialized Llama-based model with improved reasoning over general-purpose alternatives.