xiaolesu/OsmosisProofling-GRPO-NT

Hugging Face
TEXT GENERATIONConcurrency Cost:1Model Size:8BQuant:FP8Ctx Length:32kPublished:Apr 3, 2026Architecture:Transformer Warm

The xiaolesu/OsmosisProofling-GRPO-NT model is an experimental checkpoint derived from the "Data Overlap as a Post-Training Hyperparameter for Autoformalization" research. This variant is a GRPO-only (Gradient Regularized Policy Optimization) Qwen3-8B model, trained directly on the base model without Supervised Fine-Tuning (SFT) priming. It is specifically designed for autoformalization research, focusing on the impact of data overlap as a post-training hyperparameter.

Loading preview...

OsmosisProofling-GRPO-NT: An Experimental Autoformalization Model

This model, xiaolesu/OsmosisProofling-GRPO-NT, represents an experimental checkpoint developed as part of the research detailed in the paper "SFT-GRPO Data Overlap as a Post-Training Hyperparameter for Autoformalization" by Xiaole Su, Kasey Zhang, and Andy Lyu. It is a specialized variant of the Qwen3-8B architecture.

Key Characteristics

  • GRPO-only Training: This model was trained using only Gradient Regularized Policy Optimization (GRPO), bypassing initial Supervised Fine-Tuning (SFT) priming.
  • Base Model Foundation: It is built directly upon a base Qwen3-8B model, with its 'thinking' capabilities disabled for specific experimental control.
  • Research Focus: The primary purpose of this model is to investigate the effects of data overlap as a post-training hyperparameter within the context of autoformalization.

Use Cases

  • Autoformalization Research: Ideal for researchers studying the impact of training methodologies and data characteristics on the autoformalization process.
  • Experimental Baseline: Serves as a specific experimental artifact for replicating and extending the findings presented in the associated research paper.

For comprehensive details, including results and all related artifacts, refer to the paper repository.