Name: mjf-su/ADEnReward-ReasoningConfidenceReward API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: mjf-su

ADEnReward-ReasoningConfidenceReward Overview

This model, developed by mjf-su, is a 4 billion parameter language model fine-tuned from the mjf-su/PhysicalAI-reason-VLA-MetaAction-1e base model. It leverages the GRPO (Generative Reinforcement Learning with Policy Optimization) method, as introduced in the paper "DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models" (arXiv:2402.03300), to enhance its reasoning and confidence assessment abilities.

Key Capabilities

Enhanced Reasoning: Benefits from GRPO training, which is specifically designed to improve mathematical and general reasoning in language models.
Fine-tuned Performance: Builds upon the capabilities of its base model, mjf-su/PhysicalAI-reason-VLA-MetaAction-1e, with further optimization for specific reasoning tasks.
Context Length: Supports a substantial context length of 32768 tokens, allowing for processing longer inputs and maintaining coherence over extended interactions.

Training Details

The model was trained using the TRL (Transformer Reinforcement Learning) framework, with specific versions including TRL 0.26.1, Transformers 4.57.6, and Pytorch 2.10.0. The training process can be visualized via Weights & Biases, indicating a structured and monitored development approach.

Good For

Applications requiring improved reasoning capabilities, particularly in areas where mathematical or logical inference is crucial.
Scenarios where confidence assessment in model outputs is beneficial.
Tasks that can leverage a large context window for complex problem-solving or detailed analysis.

Overview

ADEnReward-ReasoningConfidenceReward Overview

Key Capabilities

Training Details

Good For

Full Model Card (README)