Model Overview
Jeremmmyyyyy/Qwen-poetry-logprob-no-norm-v3 is a 2 billion parameter language model, fine-tuned from the Qwen3-1.7B base model. It leverages the Transformer Reinforcement Learning (TRL) framework for its training procedure.
Key Differentiator: GRPO Training
This model's primary distinction lies in its training methodology. It was fine-tuned using GRPO (Gradient-based Reward Policy Optimization), a technique introduced in the paper "DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models". This method is specifically designed to improve a model's capabilities in complex reasoning tasks, particularly within mathematical domains.
Technical Specifications
- Base Model: Qwen/Qwen3-1.7B
- Parameter Count: 2 billion
- Context Length: 32768 tokens
- Training Frameworks: TRL (version 0.17.0), Transformers (version 4.51.3), Pytorch (version 2.6.0), Datasets (version 3.5.0), Tokenizers (version 0.21.1).
Potential Use Cases
Given its GRPO-based training, this model is likely well-suited for applications requiring:
- Mathematical problem-solving: Tasks involving arithmetic, algebra, calculus, or other mathematical reasoning.
- Logical deduction: Scenarios where structured, step-by-step reasoning is crucial.
- Scientific computing assistance: Generating or interpreting mathematical expressions and solutions.
Developers can quickly integrate this model using the Hugging Face pipeline for text generation tasks.