ryokamoi/Llama-3.1-8B-FoVer-PRM-old
TEXT GENERATIONConcurrency Cost:1Model Size:8BQuant:FP8Ctx Length:32kPublished:May 21, 2025License:llama3.1Architecture:Transformer Cold

The ryokamoi/Llama-3.1-8B-FoVer-PRM-old is an 8 billion parameter Llama 3.1-based Process Reward Model (PRM) developed by Ryo Kamoi and the PSU NLP Group, with a 32768 token context length. It is specifically trained using formal verification tools (Z3, Isabelle) to provide step-level error feedback on reasoning tasks. This model excels at verifying formal logic and proof steps, demonstrating cross-task transfer capabilities for improved verification across mathematics, academic problems, and abstract reasoning.

Loading preview...

ryokamoi/Llama-3.1-8B-FoVer-PRM-old: Formal Verification for LLM Reasoning

This model is an 8 billion parameter Process Reward Model (PRM) based on Llama 3.1, developed by Ryo Kamoi and the PSU NLP Group. It is designed to provide step-level feedback on the reasoning generated by large language models (LLMs), enhancing their capabilities through reinforcement learning and inference-time refinement. The model leverages a novel approach called FoVer, which synthesizes PRM training data using formal verification tools like Z3 and Isabelle to automatically annotate step-level errors.

Key Capabilities

  • Automated Error Annotation: Utilizes formal verification to generate precise step-level error labels for LLM responses.
  • Cross-Task Transfer: Demonstrates the ability to transfer verification capabilities learned in formal logic and proof tasks to a broad range of other reasoning tasks, including mathematics, academic problems, and abstract reasoning.
  • Step-Level Feedback: Provides granular feedback on individual steps within an LLM's reasoning process, crucial for improving complex problem-solving.
  • High Context Length: Supports a context length of 32768 tokens, allowing for analysis of extensive reasoning chains.

Good For

  • Training LLMs: Ideal for researchers and developers looking to train or fine-tune LLMs with robust step-level feedback for improved reasoning.
  • Evaluating Reasoning: Can be used as an evaluation benchmark for PRMs on formal logic and proof tasks.
  • Formal Verification Tasks: Particularly strong in verifying steps related to formal logic and mathematical proofs.
  • Enhancing LLM Reliability: Useful for applications requiring high reliability in LLM-generated reasoning, such as scientific discovery or complex problem-solving systems.