jordanpainter/diallm-llama-gspo-aus

TEXT GENERATIONConcurrency Cost:1Model Size:8BQuant:FP8Ctx Length:32kPublished:Apr 17, 2026Architecture:Transformer Cold

The jordanpainter/diallm-llama-gspo-aus is an 8 billion parameter language model, fine-tuned from jordanpainter/diallm-llama-sft-aus, with a context length of 32768 tokens. This model was trained using the GRPO method, as introduced in the DeepSeekMath paper, which focuses on enhancing mathematical reasoning capabilities. It is designed to provide improved performance in tasks requiring advanced reasoning, building upon its base model.

Loading preview...

Model Overview

The jordanpainter/diallm-llama-gspo-aus is an 8 billion parameter language model, fine-tuned from the jordanpainter/diallm-llama-sft-aus base model. It leverages a substantial context length of 32768 tokens, allowing for processing longer inputs and generating more coherent, extended responses.

Key Training Details

This model distinguishes itself through its training methodology. It was fine-tuned using GRPO (Gradient Regularized Policy Optimization), a method detailed in the research paper "DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models" (arXiv:2402.03300). This indicates a focus on enhancing the model's reasoning abilities, particularly in complex problem-solving scenarios.

Frameworks Used

The training process utilized several key frameworks:

  • TRL: 0.28.0
  • Transformers: 4.57.6
  • PyTorch: 2.5.1+cu121
  • Datasets: 4.5.0

Potential Use Cases

Given its GRPO-based training, this model is likely well-suited for applications requiring:

  • Advanced reasoning and logical inference.
  • Tasks that benefit from processing extensive contextual information due to its large context window.
  • Building upon the capabilities of its diallm-llama-sft-aus predecessor with enhanced reasoning.