harsha070/expfinal-phi-mbpp-s42-lambda-0p75

TEXT GENERATIONConcurrency Cost:1Model Size:4BQuant:BF16Ctx Length:4kPublished:May 6, 2026Architecture:Transformer Cold

The harsha070/expfinal-phi-mbpp-s42-lambda-0p75 is a 4 billion parameter language model, fine-tuned from harsha070/sft-warmup-phi-v1. This model was trained using the GRPO method, as introduced in the DeepSeekMath paper, which focuses on enhancing mathematical reasoning capabilities. It is designed for general text generation tasks, leveraging its specialized training for improved performance. The model has a context length of 4096 tokens.

Loading preview...

Model Overview

The harsha070/expfinal-phi-mbpp-s42-lambda-0p75 is a 4 billion parameter language model, fine-tuned by harsha070. It is based on the harsha070/sft-warmup-phi-v1 model and was trained using the TRL framework.

Key Differentiator: GRPO Training

A significant aspect of this model is its training methodology. It utilizes GRPO (Gradient-based Reward Policy Optimization), a method detailed in the research paper "DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models". This suggests an optimization for tasks requiring robust reasoning, potentially in mathematical or logical domains, distinguishing it from models trained with standard fine-tuning approaches.

Technical Details

  • Base Model: harsha070/sft-warmup-phi-v1
  • Training Framework: TRL (Transformers Reinforcement Learning)
  • Parameter Count: 4 billion
  • Context Length: 4096 tokens

Potential Use Cases

Given its GRPO training, this model could be particularly well-suited for:

  • Reasoning-intensive tasks: Applications requiring logical deduction or problem-solving.
  • Mathematical text generation: Generating explanations, solutions, or proofs related to mathematical concepts.
  • General text generation: While specialized, its base as a language model allows for broad text generation capabilities.