mjf-su/ADEnReward

VISIONConcurrency Cost:1Model Size:4BQuant:BF16Ctx Length:32kPublished:Apr 22, 2026Architecture:Transformer Cold

ADEnReward is a 4 billion parameter language model developed by mjf-su, fine-tuned from mjf-su/PhysicalAI-reason-VLA-MetaAction-1e. It utilizes the GRPO training method, as introduced in the DeepSeekMath paper, to enhance mathematical reasoning capabilities. With a context length of 32768 tokens, this model is optimized for tasks requiring advanced reasoning and problem-solving.

Loading preview...

ADEnReward Model Summary

ADEnReward is a 4 billion parameter language model developed by mjf-su, building upon the base model mjf-su/PhysicalAI-reason-VLA-MetaAction-1e. This model distinguishes itself through its training methodology, employing GRPO (Generalized Reinforcement Learning from Policy Optimization). GRPO is a technique highlighted in the research behind DeepSeekMath, a model known for pushing the boundaries of mathematical reasoning in open language models.

Key Capabilities

  • Enhanced Reasoning: Leverages the GRPO training method, suggesting a focus on improving complex reasoning and problem-solving abilities, particularly in areas like mathematics.
  • Fine-tuned Performance: Developed using the TRL (Transformer Reinforcement Learning) framework, indicating a specialized optimization process beyond standard pre-training.
  • Extended Context: Features a substantial context length of 32768 tokens, allowing it to process and generate longer, more coherent responses based on extensive input.

Good For

  • Mathematical Reasoning Tasks: Given its training with GRPO, it is likely well-suited for tasks requiring logical deduction, numerical problem-solving, and mathematical understanding.
  • Complex Problem Solving: The combination of a large context window and specialized reasoning training makes it potentially effective for intricate, multi-step problems.
  • Research and Development: Provides a foundation for further experimentation with GRPO-based fine-tuning and exploring its applications in various reasoning domains.