AbleCredit/AbleCredit-R0-Qwen-2.5-3B-Instruct

Hugging Face
TEXT GENERATIONConcurrency Cost:1Model Size:3.1BQuant:BF16Ctx Length:32kPublished:Feb 16, 2025Architecture:Transformer0.0K Warm

AbleCredit-R0-Qwen-2.5-3B-Instruct is a 3.1 billion parameter instruction-tuned causal language model developed by AbleCredit (LightBees Technologies Private Limited). It is fine-tuned from Qwen 2.5 3B Instruct using Deepseek R1 style (GRPO) reinforcement learning. This model is primarily intended for research in applying small LLMs to financial domains like credit underwriting, demonstrating strong logical reasoning capabilities.

Loading preview...

Overview

AbleCredit-R0-Qwen-2.5-3B-Instruct is a 3.1 billion parameter instruction-tuned language model developed by AbleCredit (LightBees Technologies Private Limited). It is built upon the Qwen 2.5 3B Instruct base model and has been fine-tuned using Deepseek R1 style (GRPO) reinforcement learning with rule-based rewards.

Key Capabilities & Training

  • Reinforcement Learning: Utilizes GRPO (Deepseek R1 style) for fine-tuning, focusing on enhancing reasoning abilities.
  • Specialized Training Data: Trained on a combination of open-source logical reasoning datasets and a proprietary finance dataset created by AbleCredit.com.
  • Reasoning Focus: Designed with a primary intent for research in applying small LLMs to financial applications, particularly credit underwriting.

Performance

  • GSM8K Benchmark: Achieves approximately 67% score on the GSM8K mathematical reasoning benchmark in a zero-shot setting, indicating strong logical reasoning without specific examples.

Usage & Licensing

  • Hugging Face Integration: Compatible with standard Hugging Face setups for easy deployment and interaction.
  • License: Retains the original Qwen research license, which does not permit commercial use.