Name: harsha070/expfinal-qwen-mbpp-s42-lambda-0p25 API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: harsha070

Overview

This model, harsha070/expfinal-qwen-mbpp-s42-lambda-0p25, is a 3.1 billion parameter language model fine-tuned from harsha070/sft-warmup-qwen-v1. It leverages the GRPO (Gradient-based Reward Policy Optimization) training method, a technique specifically developed to improve mathematical reasoning in large language models, as detailed in the DeepSeekMath paper.

Key Capabilities

Enhanced Mathematical Reasoning: Benefits from the GRPO training method, making it suitable for tasks that require robust mathematical problem-solving.
Instruction-tuned: Built upon an instruction-tuned base model, suggesting good performance on general instruction-following tasks.
32K Context Window: Supports a substantial context length of 32,768 tokens, allowing for processing longer inputs and more complex problem descriptions.

Good for

Mathematical Problem Solving: Ideal for applications involving arithmetic, algebra, calculus, and other mathematical reasoning challenges.
Code Generation (with mathematical context): Potentially useful for generating code snippets that involve mathematical logic or algorithms.
Research in LLM Training Methods: Provides an example of a model trained with GRPO, useful for researchers exploring advanced fine-tuning techniques.

Overview

Overview

Key Capabilities

Good for

Full Model Card (README)