harheem/qwen2.5-3b-grpo-ko-200-16bit

TEXT GENERATIONConcurrency Cost:1Model Size:3.1BQuant:BF16Ctx Length:32kPublished:Feb 19, 2025License:apache-2.0Architecture:Transformer Open Weights Cold

The harheem/qwen2.5-3b-grpo-ko-200-16bit model is a 3.1 billion parameter Qwen2.5-based causal language model developed by harheem. It was fine-tuned from unsloth/qwen2.5-3b-instruct-unsloth-bnb-4bit using Unsloth and Huggingface's TRL library, enabling 2x faster training. This model is optimized for efficient performance and is suitable for applications requiring a compact yet capable language model.

Loading preview...

Overview

The harheem/qwen2.5-3b-grpo-ko-200-16bit is a 3.1 billion parameter language model, developed by harheem. It is based on the Qwen2.5 architecture and was fine-tuned from the unsloth/qwen2.5-3b-instruct-unsloth-bnb-4bit model. A key characteristic of this model is its training methodology, which leveraged Unsloth and Huggingface's TRL library, resulting in a 2x speedup during the training process.

Key Capabilities

  • Efficient Training: Utilizes Unsloth for significantly faster fine-tuning.
  • Qwen2.5 Architecture: Benefits from the robust base architecture of Qwen2.5.
  • Compact Size: At 3.1 billion parameters, it offers a balance between performance and resource efficiency.

Good for

  • Applications requiring a smaller, performant language model.
  • Scenarios where rapid fine-tuning is a priority.
  • Developers looking for a Qwen2.5-based model with optimized training.

This model is released under the Apache-2.0 license.