harsha070/expfinal-phi-mbpp-s42-lambda-0p25

TEXT GENERATIONConcurrency Cost:1Model Size:4BQuant:BF16Ctx Length:4kPublished:May 6, 2026Architecture:Transformer Cold

The harsha070/expfinal-phi-mbpp-s42-lambda-0p25 is a 4 billion parameter language model, fine-tuned from harsha070/sft-warmup-phi-v1 using the TRL framework. This model incorporates the GRPO training method, as introduced in the DeepSeekMath paper, to enhance its mathematical reasoning capabilities. It is specifically optimized for tasks requiring robust logical and mathematical problem-solving.

Loading preview...

Model Overview

This model, harsha070/expfinal-phi-mbpp-s42-lambda-0p25, is a 4 billion parameter language model derived from harsha070/sft-warmup-phi-v1. It has been fine-tuned using the TRL (Transformers Reinforcement Learning) framework, specifically version 1.3.0.

Key Training Details

A significant aspect of this model's development is the application of the GRPO (Gradient Regularized Policy Optimization) training method. This technique, detailed in the research paper "DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models" (arXiv:2402.03300), aims to improve the model's proficiency in mathematical reasoning tasks. The training utilized Transformers 5.8.0, Pytorch 2.11.0, Datasets 4.8.5, and Tokenizers 0.22.2.

Use Cases

Given its fine-tuning with the GRPO method, this model is particularly well-suited for:

  • Mathematical problem-solving: Tasks that require logical deduction and numerical computation.
  • Reasoning-intensive applications: Scenarios where robust analytical capabilities are crucial.
  • Code generation related to mathematical or logical functions: Potentially beneficial for generating code that solves specific mathematical challenges, although not explicitly stated as a primary focus.