hbx/JustRL-DeepSeek-1.5B is a 1.5 billion parameter language model developed by hbx, fine-tuned from DeepSeek-R1-Distill-Qwen-1.5B. It demonstrates competitive performance on mathematical reasoning tasks using a simplified Reinforcement Learning (RL) approach. This model achieves state-of-the-art results at its scale with single-stage training and fixed hyperparameters, requiring less compute than multi-stage methods. It is primarily designed for efficient and robust mathematical problem-solving.
No reviews yet. Be the first to review!