Name: xiaolesu/OsmosisProofling-GRPO-NT API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: xiaolesu

OsmosisProofling-GRPO-NT: An Experimental Autoformalization Model

This model, xiaolesu/OsmosisProofling-GRPO-NT, represents an experimental checkpoint developed as part of the research detailed in the paper "SFT-GRPO Data Overlap as a Post-Training Hyperparameter for Autoformalization" by Xiaole Su, Kasey Zhang, and Andy Lyu. It is a specialized variant of the Qwen3-8B architecture.

Key Characteristics

GRPO-only Training: This model was trained using only Gradient Regularized Policy Optimization (GRPO), bypassing initial Supervised Fine-Tuning (SFT) priming.
Base Model Foundation: It is built directly upon a base Qwen3-8B model, with its 'thinking' capabilities disabled for specific experimental control.
Research Focus: The primary purpose of this model is to investigate the effects of data overlap as a post-training hyperparameter within the context of autoformalization.

Use Cases

Autoformalization Research: Ideal for researchers studying the impact of training methodologies and data characteristics on the autoformalization process.
Experimental Baseline: Serves as a specific experimental artifact for replicating and extending the findings presented in the associated research paper.

For comprehensive details, including results and all related artifacts, refer to the paper repository.

Overview

OsmosisProofling-GRPO-NT: An Experimental Autoformalization Model

Key Characteristics

Use Cases

Full Model Card (README)