Name: LLucass/TT_L0.2_H0.2_grpo API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: LLucass

Model Overview

LLucass/TT_L0.2_H0.2_grpo is a 1.5 billion parameter language model derived from the DeepSeek-R1-Distill-Qwen-1.5B architecture. It has been specifically fine-tuned using the GRPO (Gradient-based Reward Policy Optimization) method, as introduced in the paper "DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models" (arXiv:2402.03300). The training utilized the knoveleng/open-rs dataset and the TRL framework.

Key Capabilities

Mathematical Reasoning: Optimized for complex mathematical problem-solving due to its GRPO training.
Context Handling: Supports a substantial context length of 32768 tokens, allowing for processing longer mathematical problems or related text.
Efficient Size: At 1.5 billion parameters, it offers a balance between performance in its specialized domain and computational efficiency.

Good For

Applications requiring strong mathematical reasoning abilities.
Research and development in improving language models for quantitative tasks.
Scenarios where a smaller, specialized model for mathematical content is preferred over larger, general-purpose LLMs.