Name: jordanpainter/diallm-qwen-gspo-aus API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: jordanpainter

Model Overview

The jordanpainter/diallm-qwen-gspo-aus is an 8 billion parameter language model, fine-tuned from the jordanpainter/diallm-qwen-sft-aus base model. It leverages the GRPO (Gradient-based Reward Policy Optimization) training method, as detailed in the "DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models" paper, to enhance its reasoning and generative capabilities. The model was trained using the TRL framework and supports a substantial context length of 32768 tokens.

Key Capabilities

Enhanced Reasoning: Benefits from the GRPO training method, which is designed to improve logical and mathematical reasoning in language models.
Generative AI: Suitable for generating coherent and contextually relevant text, building upon its fine-tuned base.
Long Context Understanding: Supports a 32K token context window, allowing for processing and generating longer, more complex interactions.

Training Details

The model's training procedure involved the TRL framework (version 0.28.0) and utilized specific versions of Transformers (4.57.6), Pytorch (2.5.1+cu121), Datasets (4.5.0), and Tokenizers (0.22.2). The GRPO method, a key differentiator, aims to optimize the model's ability to handle intricate problem-solving and conversational nuances.

Overview

Model Overview

Key Capabilities

Training Details

Full Model Card (README)