SlowGuess/ABForge-Qwen3-8B-Task2-SFT

TEXT GENERATIONConcurrency Cost:1Model Size:8BQuant:FP8Ctx Length:32kTool Calling:SupportedPublished:Jun 11, 2026License:apache-2.0Architecture:Transformer Open Weights Cold

The SlowGuess/ABForge-Qwen3-8B-Task2-SFT model is an 8 billion parameter language model, fine-tuned from Qwen/Qwen3-8B, specifically designed for generating ablation experiment design plans. This model excels at producing detailed, controlled ablation plans (objective, setup, variants, fixed protocols, and metrics) based on a given paper's context and a specific goal. It is part of the ABForge post-training pipeline, focusing on paper-grounded ablation design.

Loading preview...

Model Overview

SlowGuess/ABForge-Qwen3-8B-Task2-SFT is an 8 billion parameter language model, supervised fine-tuned (SFT) from the Qwen/Qwen3-8B base model. It is a specialized component of the ABForge post-training pipeline, which focuses on paper-grounded ablation design.

Key Capabilities

  • Ablation Plan Generation: The model's primary function is to generate detailed and controlled ablation experiment design plans. This includes defining the objective, experimental setup, variants, fixed protocols, and metrics for evaluation.
  • Contextual Understanding: It processes a given research paper's context and a specific goal to formulate relevant ablation plans.
  • Specialized Fine-tuning: The model was fine-tuned on sft_task2_37019.jsonl from the SlowGuess/abforge-data dataset, which is derived from CC-licensed research papers, ensuring its relevance to academic and research contexts.

Evaluation and Related Models

Evaluation is performed using the held-out AblationBench split (eval/ablationbench_200.jsonl) of the same dataset. Users can reproduce the AblationBench evaluation using the provided SlowGuess/Abforge_1 code, which includes scripts for generating predictions and scoring them against a Claude-judged rubric.

This model is specifically for "Task 2: Ablation Plan Generation" within the ABForge framework, distinguishing it from other related models like SlowGuess/ABForge-Qwen3-8B-Task2 and SlowGuess/ABForge-Qwen3-8B-Task2-RL.