Name: Thrillcrazyer/Qwen-7B_NOTAC_PPO API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: Thrillcrazyer

Model Overview

Thrillcrazyer/Qwen-7B_NOTAC_PPO is a 7.6 billion parameter language model derived from the Qwen/Qwen2.5-7B-Instruct architecture. Its primary distinction lies in its specialized training for mathematical reasoning, leveraging the DeepMath-103k dataset.

Key Capabilities & Training

Mathematical Reasoning: The model has undergone fine-tuning specifically to enhance its capabilities in solving complex mathematical problems and performing advanced reasoning.
GRPO Training Method: It was trained using the GRPO (Generalized Reinforcement Learning with Policy Optimization) method, as introduced in the research paper "DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models" (arXiv:2402.03300). This method is designed to improve performance in mathematical contexts.
High Context Length: The model supports a notable context length of 131072 tokens, allowing for processing extensive inputs and maintaining coherence over long interactions.
Frameworks: Training was conducted using TRL (Transformer Reinforcement Learning) version 0.26.2, alongside Transformers 4.57.3 and PyTorch 2.8.0.

Use Cases

This model is particularly well-suited for applications requiring robust mathematical problem-solving, logical deduction, and handling complex numerical or scientific queries. Its specialized training makes it a strong candidate for tasks where precise and accurate mathematical reasoning is critical.

Overview

Model Overview

Key Capabilities & Training

Use Cases

Full Model Card (README)