xiaolesu/OsmosisProofling-SFT-NT-GRPO-NT-No-Overlap

Hugging Face
TEXT GENERATIONConcurrency Cost:1Model Size:8BQuant:FP8Ctx Length:32kPublished:Apr 8, 2026Architecture:Transformer Warm

The xiaolesu/OsmosisProofling-SFT-NT-GRPO-NT-No-Overlap model is an experimental checkpoint based on Qwen3-8B, developed by Xiaole Su. This variant focuses on autoformalization, specifically utilizing Supervised Fine-Tuning (SFT) and Gradient-based Reward Policy Optimization (GRPO) with a 0% data overlap between the SFT and GRPO datasets. It is designed for tasks requiring formalization of natural language, representing the best-performing condition from its associated research.

Loading preview...

Overview

This model, xiaolesu/OsmosisProofling-SFT-NT-GRPO-NT-No-Overlap, is an experimental checkpoint derived from the Qwen3-8B architecture. It was developed by Xiaole Su as part of research into "Data Overlap as a Post-Training Hyperparameter for Autoformalization." The model represents the best-performing condition from these experiments, specifically the variant where Supervised Fine-Tuning (SFT) and Gradient-based Reward Policy Optimization (GRPO) data are fully disjoint (0% overlap).

Key Capabilities

  • Autoformalization: Designed to convert natural language into formal specifications or proofs.
  • Experimental Setup: Represents a specific configuration (SFT+GRPO with no data overlap) identified as optimal in the associated research.
  • Qwen3-8B Base: Built upon the Qwen3-8B model, with specific modifications for autoformalization tasks.

Good For

  • Research in Autoformalization: Ideal for researchers exploring the impact of data overlap in post-training for formalization tasks.
  • Benchmarking: Can be used as a baseline or comparison point for other autoformalization models, particularly those using SFT and GRPO techniques.
  • Understanding Data Overlap: Provides a concrete example of a model trained under specific data overlap conditions, as detailed in the accompanying paper.