Name: harsha070/expfinal-phi-mbpp-s42-lambda-0p75 API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: harsha070

Model Overview

The harsha070/expfinal-phi-mbpp-s42-lambda-0p75 is a 4 billion parameter language model, fine-tuned by harsha070. It is based on the harsha070/sft-warmup-phi-v1 model and was trained using the TRL framework.

Key Differentiator: GRPO Training

A significant aspect of this model is its training methodology. It utilizes GRPO (Gradient-based Reward Policy Optimization), a method detailed in the research paper "DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models". This suggests an optimization for tasks requiring robust reasoning, potentially in mathematical or logical domains, distinguishing it from models trained with standard fine-tuning approaches.

Technical Details

Base Model: harsha070/sft-warmup-phi-v1
Training Framework: TRL (Transformers Reinforcement Learning)
Parameter Count: 4 billion
Context Length: 4096 tokens

Potential Use Cases

Given its GRPO training, this model could be particularly well-suited for:

Reasoning-intensive tasks: Applications requiring logical deduction or problem-solving.
Mathematical text generation: Generating explanations, solutions, or proofs related to mathematical concepts.
General text generation: While specialized, its base as a language model allows for broad text generation capabilities.

Overview

Model Overview

Key Differentiator: GRPO Training

Technical Details

Potential Use Cases

Full Model Card (README)