xiaolesu/OsmosisProofling-SFT-NT-GRPO-NT-Overlap
The xiaolesu/OsmosisProofling-SFT-NT-GRPO-NT-Overlap model is an experimental checkpoint derived from the Qwen3-8B architecture, developed by Xiaole Su, Kasey Zhang, and Andy Lyu. This variant specifically represents the SFT+GRPO condition with 100% data overlap, where GRPO reuses the Supervised Fine-Tuning (SFT) data entirely. It is designed for research into autoformalization, focusing on the impact of data overlap as a post-training hyperparameter. This model is a control condition for studying data overlap effects in autoformalization tasks.
Loading preview...
Model Overview
The xiaolesu/OsmosisProofling-SFT-NT-GRPO-NT-Overlap is an experimental model checkpoint developed by Xiaole Su, Kasey Zhang, and Andy Lyu. It is based on the Qwen3-8B architecture and is a specific variant used in research concerning "Data Overlap as a Post-Training Hyperparameter for Autoformalization." This particular version represents the SFT+GRPO with 100% overlap condition, serving as a control where the Generative Replay Policy Optimization (GRPO) process reuses the entire dataset from Supervised Fine-Tuning (SFT).
Key Characteristics
- Experimental Checkpoint: Part of a research study on data overlap in autoformalization.
- Base Model: Built upon the Qwen3-8B architecture.
- Training Method: Combines Supervised Fine-Tuning (SFT) with Generative Replay Policy Optimization (GRPO).
- Data Overlap: Features 100% data overlap, meaning GRPO reuses all SFT data.
- Purpose: Acts as a control condition to analyze the effects of data overlap on model performance in autoformalization tasks.
Research Context
This model is directly associated with the paper "SFT-GRPO Data Overlap as a Post-Training Hyperparameter for Autoformalization" by Xiaole Su, Kasey Zhang, and Andy Lyu, available on arXiv. The paper's repository provides further details, results, and related artifacts for this experimental work.