SlowGuess/ABForge-Qwen3-8B-Task2-RL
The SlowGuess/ABForge-Qwen3-8B-Task2-RL is an 8 billion parameter Qwen3-based language model developed by SlowGuess, fine-tuned with GRPO directly from Qwen/Qwen3-8B. This model specializes in generating detailed ablation experiment design plans (objective, setup, variants, protocols, metrics) based on a paper's context and a specified goal. It is specifically optimized for Task 2 within the ABForge pipeline, which focuses on paper-grounded ablation design.
Loading preview...
ABForge-Qwen3-8B-Task2-RL Overview
This model, developed by SlowGuess, is an 8 billion parameter Qwen3-based language model specifically designed for Task 2: Ablation Plan Generation within the ABForge framework. Unlike its supervised counterparts, this checkpoint is trained using GRPO (Gradient-based Policy Optimization) directly from Qwen/Qwen3-8B, without a supervised warm-start, optimizing for a fixed rubric-based reward.
Key Capabilities
- Ablation Experiment Design: Generates comprehensive ablation experiment plans, including objectives, setup details, variant definitions, fixed protocols, and evaluation metrics.
- Contextual Understanding: Processes a paper's context and a specific goal to formulate relevant ablation designs.
- Reinforcement Learning Fine-tuning: Utilizes GRPO on the
train/RL_task2_30K.jsonldataset fromSlowGuess/abforge-data, which is derived from CC-licensed research papers.
Good For
- Researchers and developers needing automated assistance in designing ablation studies for scientific papers.
- Generating structured and detailed experiment plans for model components or methodologies.
- Integration into research pipelines requiring systematic ablation design based on textual input.
Evaluation of this model can be reproduced using the SlowGuess/Abforge_1 code, which includes scripts for generating predictions on the AblationBench dataset and scoring them against a Claude-judged rubric.