Name: mjf-su/FaithfulnessGuidanceReward API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: mjf-su

Model Overview

The mjf-su/FaithfulnessGuidanceReward is a 4 billion parameter language model, fine-tuned from the mjf-su/PhysicalAI-reason-VLA-MetaAction-1e base model. It leverages the TRL (Transformer Reinforcement Learning) framework for its training process.

Key Capabilities & Training

This model's core differentiator lies in its training methodology: it was developed using GRPO (Guidance-based Reinforcement Learning for Policy Optimization). This technique, detailed in the research paper "DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models" (arXiv:2402.03300), aims to significantly improve the model's mathematical reasoning and faithfulness in its responses. The training process can be visualized via Weights & Biases, indicating a focus on robust and reliable output generation.

Technical Specifications

Parameters: 4 Billion
Context Length: 32768 tokens
Frameworks: TRL (0.26.1), Transformers (4.57.6), Pytorch (2.10.0), Datasets (4.4.1), Tokenizers (0.22.1)

When to Use This Model

This model is particularly well-suited for applications where:

Faithful and accurate reasoning is paramount.
Tasks involve mathematical problem-solving or require logical consistency.
You need a model that has been specifically optimized for guidance-based reinforcement learning to produce more reliable outputs.

Overview

Model Overview

Key Capabilities & Training

Technical Specifications

When to Use This Model

Full Model Card (README)