Name: harsha070/exp2-qwen-mbpp-s123-lambda-0p25 API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: harsha070

Model Overview

The harsha070/exp2-qwen-mbpp-s123-lambda-0p25 is a 3.1 billion parameter language model, fine-tuned from harsha070/sft-warmup-qwen-v2. This model leverages a 32768 token context length, making it suitable for processing longer inputs.

Key Training Details

Fine-tuning Method: The model was trained using the TRL library.
Optimization Technique: It incorporates GRPO (Gradient-based Reward Policy Optimization), a method highlighted in the DeepSeekMath paper, which focuses on improving mathematical reasoning capabilities.

Intended Use Cases

This model is particularly well-suited for applications that demand strong mathematical reasoning and problem-solving. Its training with GRPO suggests an emphasis on tasks where logical deduction and numerical accuracy are critical.

Overview

Model Overview

Key Training Details

Intended Use Cases

Full Model Card (README)