Name: LLucass/TT_L0.2_H0.2_dr_grpo API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: LLucass

Overview

LLucass/TT_L0.2_H0.2_dr_grpo is a 1.5 billion parameter language model developed by LLucass. It is a fine-tuned version of the deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B base model, specifically trained on the knoveleng/open-rs dataset. The training utilized the TRL (Transformer Reinforcement Learning) framework and incorporated the GRPO (Gradient-based Reward Policy Optimization) method, which is known for enhancing mathematical reasoning in language models.

Key Capabilities

Enhanced Reasoning: Benefits from the GRPO training method, which is designed to improve mathematical reasoning abilities.
Fine-tuned Performance: Specialized training on the knoveleng/open-rs dataset for specific domain applications.
Efficient Architecture: Built upon a 1.5 billion parameter model, offering a balance between performance and computational efficiency.

Good for

Mathematical Reasoning Tasks: Ideal for applications requiring robust mathematical problem-solving and logical deduction.
Research and Development: Suitable for researchers exploring the impact of GRPO and similar training methodologies on smaller language models.
Specialized Domain Applications: Can be adapted for tasks within domains represented by the knoveleng/open-rs dataset.

Overview

Overview

Key Capabilities

Good for

Full Model Card (README)