Name: ryzax/1.5B-v18 API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: ryzax

Overview

ryzax/1.5B-v18 is a 2 billion parameter language model developed by ryzax. It is a fine-tuned iteration of the ryzax/qwen3_1.7B_sft_correct_v3_1e-5_4 base model, specifically trained on the agentica-org/DeepScaleR-Preview-Dataset.

Key Capabilities

Enhanced Mathematical Reasoning: This model leverages the GRPO (Gradient-based Reward Policy Optimization) method, as introduced in the DeepSeekMath paper, to significantly improve its mathematical reasoning abilities.
Fine-tuned Performance: Built upon a strong base model, it has undergone further supervised fine-tuning to refine its responses and capabilities.
TRL Framework: Training was conducted using the TRL (Transformer Reinforcement Learning) library, indicating a focus on optimizing model behavior through reinforcement learning techniques.

Training Details

The model's training procedure specifically incorporated GRPO, a technique designed to push the limits of mathematical reasoning in open language models. This method is detailed in the paper "DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models" (arXiv:2402.03300).

Good For

Applications requiring strong mathematical problem-solving.
Tasks benefiting from models trained with advanced reinforcement learning techniques like GRPO.
Developers looking for a compact model with specialized reasoning capabilities.