Name: harsha070/expfinal-qwen-mbpp-s42-lambda-0p20 API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: harsha070

Model Overview

The harsha070/expfinal-qwen-mbpp-s42-lambda-0p20 is a 3.1 billion parameter language model, building upon the harsha070/sft-warmup-qwen-v1 base model. It has been fine-tuned using the TRL (Transformers Reinforcement Learning) framework, a library for training transformer models with reinforcement learning.

Key Training Details

A significant aspect of this model's development is its training with GRPO (Generalized Reinforcement Learning with Policy Optimization). This method, introduced in the research paper "DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models" (arXiv:2402.03300), suggests a focus on enhancing the model's ability to handle complex reasoning tasks, particularly in mathematical domains. The training process utilized specific versions of key frameworks:

TRL: 1.3.0
Transformers: 5.7.0
Pytorch: 2.11.0

Potential Use Cases

Given its specialized training with GRPO, this model is likely well-suited for applications requiring:

Mathematical problem-solving
Complex logical reasoning
Tasks benefiting from reinforcement learning fine-tuning

Overview

Model Overview

Key Training Details

Potential Use Cases

Full Model Card (README)