Name: harsha070/expfinal-phi-mbpp-s42-lambda-0p25 API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: harsha070

Model Overview

This model, harsha070/expfinal-phi-mbpp-s42-lambda-0p25, is a 4 billion parameter language model derived from harsha070/sft-warmup-phi-v1. It has been fine-tuned using the TRL (Transformers Reinforcement Learning) framework, specifically version 1.3.0.

Key Training Details

A significant aspect of this model's development is the application of the GRPO (Gradient Regularized Policy Optimization) training method. This technique, detailed in the research paper "DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models" (arXiv:2402.03300), aims to improve the model's proficiency in mathematical reasoning tasks. The training utilized Transformers 5.8.0, Pytorch 2.11.0, Datasets 4.8.5, and Tokenizers 0.22.2.

Use Cases

Given its fine-tuning with the GRPO method, this model is particularly well-suited for:

Mathematical problem-solving: Tasks that require logical deduction and numerical computation.
Reasoning-intensive applications: Scenarios where robust analytical capabilities are crucial.
Code generation related to mathematical or logical functions: Potentially beneficial for generating code that solves specific mathematical challenges, although not explicitly stated as a primary focus.

Overview

Model Overview

Key Training Details

Use Cases

Full Model Card (README)