Name: jordanpainter/llama_gspo_200 API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: jordanpainter

Overview

The jordanpainter/llama_gspo_200 is an 8 billion parameter language model that has been fine-tuned from the srirag/sft-llama-all base model. Its training leveraged the TRL library and specifically incorporated the GRPO (Gradient-based Reward Policy Optimization) method.

Key Capabilities

Enhanced Reasoning: The model's training with GRPO, a method detailed in the DeepSeekMath paper, suggests a focus on improving complex reasoning abilities.
Mathematical Problem Solving: Given the origin of the GRPO method in a paper dedicated to mathematical reasoning, this model is likely to exhibit stronger performance in tasks requiring logical and mathematical thought processes.
Fine-tuned Performance: As a fine-tuned variant, it aims to build upon the foundational capabilities of its base model, srirag/sft-llama-all, with specialized improvements.

Training Details

The model was trained using the TRL framework, with specific versions of libraries including TRL 0.28.0, Transformers 4.57.6, and Pytorch 2.5.1+cu121. The training process can be visualized via Weights & Biases.

Good For

Applications requiring advanced logical and mathematical reasoning.
Tasks where robust problem-solving capabilities are crucial.
Developers looking for a specialized Llama-based model with improved reasoning over general-purpose alternatives.

Overview

Overview

Key Capabilities

Training Details

Good For

Full Model Card (README)