Name: harsha070/exp2-qwen-mbpp-s42-lambda-0p25 API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: harsha070

Model Overview

The harsha070/exp2-qwen-mbpp-s42-lambda-0p25 is a 3.1 billion parameter language model, fine-tuned from harsha070/sft-warmup-qwen-v1. This model leverages a substantial 32768 token context length, making it suitable for processing longer inputs and maintaining context over extended interactions.

Key Capabilities

Enhanced Mathematical Reasoning: The model was specifically trained using the GRPO (Gradient-based Reinforcement Learning with Policy Optimization) method, as introduced in the DeepSeekMath paper. This training approach aims to significantly improve its performance on mathematical reasoning tasks.
Fine-tuned with TRL: The fine-tuning process utilized the TRL (Transformers Reinforcement Learning) library, indicating a focus on optimizing model behavior through reinforcement learning techniques.

Training Details

The training procedure for this model incorporated GRPO, a method detailed in the paper "DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models" (arXiv:2402.03300). This suggests a specialized focus on developing robust mathematical problem-solving abilities. The model was developed using TRL 1.3.0, Transformers 5.7.0, Pytorch 2.11.0, Datasets 4.8.5, and Tokenizers 0.22.2.

Good For

Applications requiring strong mathematical reasoning.
Tasks benefiting from a model fine-tuned with advanced reinforcement learning techniques like GRPO.
Scenarios where a 3.1 billion parameter model with a large context window (32768 tokens) is advantageous for balancing performance and computational resources.

Overview

Model Overview

Key Capabilities

Training Details

Good For

Full Model Card (README)