delist/miniboss

Hugging Face
TEXT GENERATIONConcurrency Cost:1Model Size:0.5BQuant:BF16Ctx Length:32kPublished:Apr 22, 2025Architecture:Transformer0.0K Warm

delist/miniboss is a 0.5 billion parameter instruction-tuned language model, fine-tuned from Gensyn/Qwen2.5-0.5B-Instruct. It was trained using the TRL framework and incorporates the GRPO method, which is designed to enhance mathematical reasoning capabilities. This model is particularly suited for tasks requiring improved logical and mathematical problem-solving, building on its Qwen2.5 base.

Loading preview...

Model Overview

delist/miniboss is a 0.5 billion parameter instruction-tuned language model, building upon the Gensyn/Qwen2.5-0.5B-Instruct base. It has been fine-tuned using the TRL (Transformer Reinforcement Learning) framework.

Key Training Innovation

A significant aspect of this model's development is the application of GRPO (Gradient-based Reward Policy Optimization). This method, introduced in the paper "DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models" (arXiv:2402.03300), is designed to enhance a model's capabilities in mathematical reasoning. This suggests that miniboss is optimized for tasks that benefit from improved logical and mathematical problem-solving.

Technical Specifications

  • Base Model: Gensyn/Qwen2.5-0.5B-Instruct
  • Parameters: 0.5 Billion
  • Context Length: 131072 tokens
  • Training Framework: TRL (version 0.15.2)
  • Key Method: GRPO for mathematical reasoning enhancement

Potential Use Cases

Given its fine-tuning with GRPO, delist/miniboss is likely well-suited for:

  • Mathematical problem-solving: Tasks involving arithmetic, algebra, or other quantitative reasoning.
  • Logical deduction: Scenarios requiring structured thought processes.
  • Instruction following: General tasks where precise adherence to instructions is important, leveraging its instruction-tuned base.