SlowGuess/ABForge-Qwen3-8B-Task1-RL

TEXT GENERATIONConcurrency Cost:1Model Size:8BQuant:FP8Ctx Length:32kTool Calling:SupportedPublished:Jun 11, 2026License:apache-2.0Architecture:Transformer Open Weights Cold

SlowGuess/ABForge-Qwen3-8B-Task1-RL is an 8 billion parameter Qwen3-based model, fine-tuned using GRPO directly from Qwen/Qwen3-8B. It is specifically designed for Task 1 of the ABForge pipeline, focusing on generating ablation objectives for research papers. The model proposes candidate ablation objectives, pairing a Target Module with a Research Question, based on ablation-free research paper context.

Loading preview...

ABForge-Qwen3-8B-Task1-RL: Ablation Objective Generation

This model, developed by SlowGuess, is an 8 billion parameter Qwen3-based language model specifically fine-tuned for Task 1: Ablation Objective Generation within the ABForge post-training pipeline. Unlike other related models, this checkpoint was trained using GRPO (Gradient-based Reward Policy Optimization) directly from Qwen/Qwen3-8B, without an initial supervised warm-start, optimizing a fixed rubric-based reward.

Key Capabilities

  • Ablation Objective Proposal: Given the ablation-free context of a research paper, the model generates candidate ablation objectives.
  • Structured Output: Each proposed objective consists of a Target Module (the component to ablate) and a corresponding Research Question it aims to answer.
  • Specialized Training: Trained on the train/RL_task1_30K.jsonl dataset from SlowGuess/abforge-data, which is derived from CC-licensed research papers.

Use Cases

  • Research Paper Analysis: Assisting researchers in identifying potential ablation studies for their work.
  • Automated Experiment Design: Generating structured ablation objectives to guide experimental setups.
  • Academic Tooling: Serving as a component in larger systems for scientific discovery and analysis.

Evaluation of this model can be reproduced using the SlowGuess/Abforge_1 code, scoring predictions against the held-out AblationBench split using a Claude judge.