xiaolesu/OsmosisProofling-GRPO-NT
The xiaolesu/OsmosisProofling-GRPO-NT model is an experimental checkpoint derived from the "Data Overlap as a Post-Training Hyperparameter for Autoformalization" research. This variant is a GRPO-only (Gradient Regularized Policy Optimization) Qwen3-8B model, trained directly on the base model without Supervised Fine-Tuning (SFT) priming. It is specifically designed for autoformalization research, focusing on the impact of data overlap as a post-training hyperparameter.
Loading preview...
OsmosisProofling-GRPO-NT: An Experimental Autoformalization Model
This model, xiaolesu/OsmosisProofling-GRPO-NT, represents an experimental checkpoint developed as part of the research detailed in the paper "SFT-GRPO Data Overlap as a Post-Training Hyperparameter for Autoformalization" by Xiaole Su, Kasey Zhang, and Andy Lyu. It is a specialized variant of the Qwen3-8B architecture.
Key Characteristics
- GRPO-only Training: This model was trained using only Gradient Regularized Policy Optimization (GRPO), bypassing initial Supervised Fine-Tuning (SFT) priming.
- Base Model Foundation: It is built directly upon a base Qwen3-8B model, with its 'thinking' capabilities disabled for specific experimental control.
- Research Focus: The primary purpose of this model is to investigate the effects of data overlap as a post-training hyperparameter within the context of autoformalization.
Use Cases
- Autoformalization Research: Ideal for researchers studying the impact of training methodologies and data characteristics on the autoformalization process.
- Experimental Baseline: Serves as a specific experimental artifact for replicating and extending the findings presented in the associated research paper.
For comprehensive details, including results and all related artifacts, refer to the paper repository.