Name: rohitcraftsyt/soc-grpo-tier1 API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: rohitcraftsyt

Overview

rohitcraftsyt/soc-grpo-tier1 is a 1.5 billion parameter language model derived from the unsloth/qwen2.5-1.5b-instruct-unsloth-bnb-4bit base. Its key differentiator lies in its training methodology: it has been fine-tuned using GRPO (Gradient-based Reasoning Policy Optimization). This method, introduced in the paper "DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models," aims to significantly improve the model's ability to handle complex mathematical and logical reasoning tasks.

Key Capabilities

Enhanced Mathematical Reasoning: The primary focus of this model's training was to boost its performance on mathematical problem-solving, making it suitable for applications where numerical and logical accuracy are critical.
Instruction Following: As it is fine-tuned from an instruction-tuned base model, it retains strong instruction-following capabilities.
Efficient Deployment: Building on a 1.5B parameter model, it offers a balance between performance and computational efficiency, making it viable for resource-constrained environments.

Training Details

The model was trained using the TRL (Transformer Reinforcement Learning) framework, specifically version 0.24.0, with Transformers 4.57.6 and PyTorch 2.10.0+cu128. The application of the GRPO method suggests an emphasis on improving reasoning through a specialized optimization approach.

Good For

Applications requiring robust mathematical problem-solving.
Tasks that benefit from improved logical deduction and reasoning.
Scenarios where a smaller, efficient model with specialized reasoning capabilities is preferred over larger, general-purpose models.

Overview

Overview

Key Capabilities

Training Details

Good For

Full Model Card (README)