SlowGuess/ABForge-Qwen3-8B-Task1-RL
SlowGuess/ABForge-Qwen3-8B-Task1-RL is an 8 billion parameter Qwen3-based model, fine-tuned using GRPO directly from Qwen/Qwen3-8B. It is specifically designed for Task 1 of the ABForge pipeline, focusing on generating ablation objectives for research papers. The model proposes candidate ablation objectives, pairing a Target Module with a Research Question, based on ablation-free research paper context.
Loading preview...
ABForge-Qwen3-8B-Task1-RL: Ablation Objective Generation
This model, developed by SlowGuess, is an 8 billion parameter Qwen3-based language model specifically fine-tuned for Task 1: Ablation Objective Generation within the ABForge post-training pipeline. Unlike other related models, this checkpoint was trained using GRPO (Gradient-based Reward Policy Optimization) directly from Qwen/Qwen3-8B, without an initial supervised warm-start, optimizing a fixed rubric-based reward.
Key Capabilities
- Ablation Objective Proposal: Given the ablation-free context of a research paper, the model generates candidate ablation objectives.
- Structured Output: Each proposed objective consists of a Target Module (the component to ablate) and a corresponding Research Question it aims to answer.
- Specialized Training: Trained on the
train/RL_task1_30K.jsonldataset fromSlowGuess/abforge-data, which is derived from CC-licensed research papers.
Use Cases
- Research Paper Analysis: Assisting researchers in identifying potential ablation studies for their work.
- Automated Experiment Design: Generating structured ablation objectives to guide experimental setups.
- Academic Tooling: Serving as a component in larger systems for scientific discovery and analysis.
Evaluation of this model can be reproduced using the SlowGuess/Abforge_1 code, scoring predictions against the held-out AblationBench split using a Claude judge.