Name: Kazuki1450/Qwen2.5-1.5B-Instruct_csum_6_10_tok_Since_1p0_0p0_1p0_grpo_42_rule API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: Kazuki1450

Overview

This model, Kazuki1450/Qwen2.5-1.5B-Instruct_csum_6_10_tok_Since_1p0_0p0_1p0_grpo_42_rule, is a fine-tuned variant of the Qwen2.5-1.5B-Instruct base model. It has been specifically trained using the GRPO (Gradient-based Reward Policy Optimization) method, a technique highlighted in the research behind DeepSeekMath. This training approach aims to improve the model's performance, particularly in areas that benefit from advanced reasoning and optimization.

Key Characteristics

Base Model: Qwen/Qwen2.5-1.5B-Instruct.
Parameter Count: 1.5 billion parameters.
Context Length: Features a notable context window of 131072 tokens.
Training Method: Fine-tuned with GRPO, as detailed in the DeepSeekMath paper.
Frameworks: Developed using TRL (Transformer Reinforcement Learning) version 0.23.0, along with Transformers 4.57.1 and Pytorch 2.7.1+cu128.

Potential Use Cases

Given its instruction-tuned nature and GRPO-enhanced training, this model is suitable for:

General text generation and conversational AI.
Tasks requiring improved reasoning capabilities, potentially in mathematical or logical domains, due to its GRPO training.
Applications benefiting from a large context window for processing extensive inputs or generating detailed outputs.

Overview

Overview

Key Characteristics

Potential Use Cases

Full Model Card (README)