The ryokamoi/Llama-3.1-8B-FoVer-PRM-old is an 8 billion parameter Llama 3.1-based Process Reward Model (PRM) developed by Ryo Kamoi and the PSU NLP Group, with a 32768 token context length. It is specifically trained using formal verification tools (Z3, Isabelle) to provide step-level error feedback on reasoning tasks. This model excels at verifying formal logic and proof steps, demonstrating cross-task transfer capabilities for improved verification across mathematics, academic problems, and abstract reasoning.
Loading preview...
ryokamoi/Llama-3.1-8B-FoVer-PRM-old: Formal Verification for LLM Reasoning
This model is an 8 billion parameter Process Reward Model (PRM) based on Llama 3.1, developed by Ryo Kamoi and the PSU NLP Group. It is designed to provide step-level feedback on the reasoning generated by large language models (LLMs), enhancing their capabilities through reinforcement learning and inference-time refinement. The model leverages a novel approach called FoVer, which synthesizes PRM training data using formal verification tools like Z3 and Isabelle to automatically annotate step-level errors.
Key Capabilities
- Automated Error Annotation: Utilizes formal verification to generate precise step-level error labels for LLM responses.
- Cross-Task Transfer: Demonstrates the ability to transfer verification capabilities learned in formal logic and proof tasks to a broad range of other reasoning tasks, including mathematics, academic problems, and abstract reasoning.
- Step-Level Feedback: Provides granular feedback on individual steps within an LLM's reasoning process, crucial for improving complex problem-solving.
- High Context Length: Supports a context length of 32768 tokens, allowing for analysis of extensive reasoning chains.
Good For
- Training LLMs: Ideal for researchers and developers looking to train or fine-tune LLMs with robust step-level feedback for improved reasoning.
- Evaluating Reasoning: Can be used as an evaluation benchmark for PRMs on formal logic and proof tasks.
- Formal Verification Tasks: Particularly strong in verifying steps related to formal logic and mathematical proofs.
- Enhancing LLM Reliability: Useful for applications requiring high reliability in LLM-generated reasoning, such as scientific discovery or complex problem-solving systems.