SlowGuess/ABForge-Qwen3-8B-Task2-SFT
The SlowGuess/ABForge-Qwen3-8B-Task2-SFT model is an 8 billion parameter language model, fine-tuned from Qwen/Qwen3-8B, specifically designed for generating ablation experiment design plans. This model excels at producing detailed, controlled ablation plans (objective, setup, variants, fixed protocols, and metrics) based on a given paper's context and a specific goal. It is part of the ABForge post-training pipeline, focusing on paper-grounded ablation design.
Loading preview...
Model Overview
SlowGuess/ABForge-Qwen3-8B-Task2-SFT is an 8 billion parameter language model, supervised fine-tuned (SFT) from the Qwen/Qwen3-8B base model. It is a specialized component of the ABForge post-training pipeline, which focuses on paper-grounded ablation design.
Key Capabilities
- Ablation Plan Generation: The model's primary function is to generate detailed and controlled ablation experiment design plans. This includes defining the objective, experimental setup, variants, fixed protocols, and metrics for evaluation.
- Contextual Understanding: It processes a given research paper's context and a specific goal to formulate relevant ablation plans.
- Specialized Fine-tuning: The model was fine-tuned on
sft_task2_37019.jsonlfrom theSlowGuess/abforge-datadataset, which is derived from CC-licensed research papers, ensuring its relevance to academic and research contexts.
Evaluation and Related Models
Evaluation is performed using the held-out AblationBench split (eval/ablationbench_200.jsonl) of the same dataset. Users can reproduce the AblationBench evaluation using the provided SlowGuess/Abforge_1 code, which includes scripts for generating predictions and scoring them against a Claude-judged rubric.
This model is specifically for "Task 2: Ablation Plan Generation" within the ABForge framework, distinguishing it from other related models like SlowGuess/ABForge-Qwen3-8B-Task2 and SlowGuess/ABForge-Qwen3-8B-Task2-RL.