LLucass/TT_L0.2_H0.2_dr_grpo

Hugging Face
TEXT GENERATIONConcurrency Cost:1Model Size:1.5BQuant:BF16Ctx Length:32kPublished:Jun 8, 2025Architecture:Transformer Warm

LLucass/TT_L0.2_H0.2_dr_grpo is a 1.5 billion parameter language model, fine-tuned from DeepSeek-R1-Distill-Qwen-1.5B by LLucass. It was trained using the TRL framework on the knoveleng/open-rs dataset, incorporating the GRPO method for mathematical reasoning. This model is primarily designed for tasks requiring enhanced reasoning capabilities, particularly in mathematical contexts, leveraging its specialized training approach.

Loading preview...

Overview

LLucass/TT_L0.2_H0.2_dr_grpo is a 1.5 billion parameter language model developed by LLucass. It is a fine-tuned version of the deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B base model, specifically trained on the knoveleng/open-rs dataset. The training utilized the TRL (Transformer Reinforcement Learning) framework and incorporated the GRPO (Gradient-based Reward Policy Optimization) method, which is known for enhancing mathematical reasoning in language models.

Key Capabilities

  • Enhanced Reasoning: Benefits from the GRPO training method, which is designed to improve mathematical reasoning abilities.
  • Fine-tuned Performance: Specialized training on the knoveleng/open-rs dataset for specific domain applications.
  • Efficient Architecture: Built upon a 1.5 billion parameter model, offering a balance between performance and computational efficiency.

Good for

  • Mathematical Reasoning Tasks: Ideal for applications requiring robust mathematical problem-solving and logical deduction.
  • Research and Development: Suitable for researchers exploring the impact of GRPO and similar training methodologies on smaller language models.
  • Specialized Domain Applications: Can be adapted for tasks within domains represented by the knoveleng/open-rs dataset.