Jeremmmyyyyy/Qwen-poetry-logprob-no-norm-v3

Hugging Face
TEXT GENERATIONConcurrency Cost:1Model Size:2BQuant:BF16Ctx Length:32kPublished:May 24, 2025Architecture:Transformer Warm

Jeremmmyyyyy/Qwen-poetry-logprob-no-norm-v3 is a 2 billion parameter language model, fine-tuned from Qwen/Qwen3-1.7B. This model was trained using the GRPO method, which is designed to enhance mathematical reasoning capabilities. It is optimized for tasks requiring advanced reasoning, particularly in mathematical contexts, and can process up to 32768 tokens.

Loading preview...

Model Overview

Jeremmmyyyyy/Qwen-poetry-logprob-no-norm-v3 is a 2 billion parameter language model, fine-tuned from the Qwen3-1.7B base model. It leverages the Transformer Reinforcement Learning (TRL) framework for its training procedure.

Key Differentiator: GRPO Training

This model's primary distinction lies in its training methodology. It was fine-tuned using GRPO (Gradient-based Reward Policy Optimization), a technique introduced in the paper "DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models". This method is specifically designed to improve a model's capabilities in complex reasoning tasks, particularly within mathematical domains.

Technical Specifications

  • Base Model: Qwen/Qwen3-1.7B
  • Parameter Count: 2 billion
  • Context Length: 32768 tokens
  • Training Frameworks: TRL (version 0.17.0), Transformers (version 4.51.3), Pytorch (version 2.6.0), Datasets (version 3.5.0), Tokenizers (version 0.21.1).

Potential Use Cases

Given its GRPO-based training, this model is likely well-suited for applications requiring:

  • Mathematical problem-solving: Tasks involving arithmetic, algebra, calculus, or other mathematical reasoning.
  • Logical deduction: Scenarios where structured, step-by-step reasoning is crucial.
  • Scientific computing assistance: Generating or interpreting mathematical expressions and solutions.

Developers can quickly integrate this model using the Hugging Face pipeline for text generation tasks.