xiaolesu/OsmosisProofling-SFT-NT-GRPO-NT-No-Overlap
The xiaolesu/OsmosisProofling-SFT-NT-GRPO-NT-No-Overlap model is an experimental checkpoint based on Qwen3-8B, developed by Xiaole Su. This variant focuses on autoformalization, specifically utilizing Supervised Fine-Tuning (SFT) and Gradient-based Reward Policy Optimization (GRPO) with a 0% data overlap between the SFT and GRPO datasets. It is designed for tasks requiring formalization of natural language, representing the best-performing condition from its associated research.
Loading preview...
Overview
This model, xiaolesu/OsmosisProofling-SFT-NT-GRPO-NT-No-Overlap, is an experimental checkpoint derived from the Qwen3-8B architecture. It was developed by Xiaole Su as part of research into "Data Overlap as a Post-Training Hyperparameter for Autoformalization." The model represents the best-performing condition from these experiments, specifically the variant where Supervised Fine-Tuning (SFT) and Gradient-based Reward Policy Optimization (GRPO) data are fully disjoint (0% overlap).
Key Capabilities
- Autoformalization: Designed to convert natural language into formal specifications or proofs.
- Experimental Setup: Represents a specific configuration (SFT+GRPO with no data overlap) identified as optimal in the associated research.
- Qwen3-8B Base: Built upon the Qwen3-8B model, with specific modifications for autoformalization tasks.
Good For
- Research in Autoformalization: Ideal for researchers exploring the impact of data overlap in post-training for formalization tasks.
- Benchmarking: Can be used as a baseline or comparison point for other autoformalization models, particularly those using SFT and GRPO techniques.
- Understanding Data Overlap: Provides a concrete example of a model trained under specific data overlap conditions, as detailed in the accompanying paper.