Name: harsha070/expfinal-qwen-island-s42-lambda-0p0 API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: harsha070

Model Overview

This model, harsha070/expfinal-qwen-island-s42-lambda-0p0, is a fine-tuned variant of the Qwen/Qwen2.5-3B-Instruct base model, featuring 3.1 billion parameters and a 32K token context length. It has been specifically trained using the GRPO (Gradient-based Reward Policy Optimization) method, as introduced in the research paper "DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models". This training approach aims to significantly improve the model's proficiency in mathematical reasoning tasks.

Key Capabilities

Enhanced Mathematical Reasoning: Leverages the GRPO training method to improve performance on complex mathematical problems.
Instruction Following: Builds upon the instruction-tuned capabilities of the Qwen2.5-3B-Instruct base model.
Efficient Inference: With 3.1 billion parameters, it offers a balance between performance and computational efficiency.

Training Details

The model was trained using the TRL (Transformers Reinforcement Learning) library, version 1.3.0, with Transformers 5.7.0 and PyTorch 2.11.0. The GRPO method, which is central to its training, is detailed in the DeepSeekMath paper.

Use Cases

This model is particularly well-suited for applications requiring strong mathematical problem-solving and logical reasoning, making it a valuable tool for educational platforms, scientific research, or any domain where precise numerical and logical understanding is critical.

Overview

Model Overview

Key Capabilities

Training Details

Use Cases

Full Model Card (README)