jordanpainter/diallm-qwen-gspo-all

TEXT GENERATIONConcurrency Cost:1Model Size:8BQuant:FP8Ctx Length:32kPublished:Apr 18, 2026Architecture:Transformer Cold

The jordanpainter/diallm-qwen-gspo-all is an 8 billion parameter language model, fine-tuned from jordanpainter/DialLM-Qwen-sft-all using the GRPO method. This model specializes in enhancing mathematical reasoning capabilities, leveraging techniques from DeepSeekMath. It is designed for tasks requiring advanced logical and mathematical problem-solving, offering a context length of 32768 tokens.

Loading preview...

Model Overview

The jordanpainter/diallm-qwen-gspo-all is an 8 billion parameter language model, building upon the jordanpainter/DialLM-Qwen-sft-all base. It has been specifically fine-tuned using the GRPO (Generative Reinforcement Learning with Policy Optimization) method, a technique introduced in the research paper "DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models". This fine-tuning process aims to significantly enhance the model's capabilities in complex mathematical reasoning and problem-solving.

Key Capabilities

  • Enhanced Mathematical Reasoning: Leverages the GRPO training approach to improve performance on tasks requiring logical and mathematical deduction.
  • Dialogue-Oriented Base: Inherits conversational abilities from its DialLM-Qwen-sft-all foundation.
  • Large Context Window: Supports a context length of 32768 tokens, allowing for processing and generating longer, more complex inputs and outputs.

Training Details

The model was trained using the TRL library (Transformers Reinforcement Learning) and the GRPO method. This approach is designed to optimize model performance in specific domains, in this case, mathematical reasoning, by learning from generated responses and refining its policy.

Use Cases

This model is particularly well-suited for applications requiring robust mathematical problem-solving, logical inference, and detailed reasoning within a conversational or generative context. Its enhanced reasoning capabilities make it a strong candidate for tasks that benefit from a deeper understanding of numerical and logical structures.