cs-552-2026-centralesupechec/general_knowledge_model
The cs-552-2026-centralesupechec/general_knowledge_model is a 1.7 billion parameter language model, post-trained from Qwen/Qwen3-1.7B by CentraleSupéchec for the EPFL CS-552 — Modern NLP course. This model specializes in closed-book multiple-choice QA, utilizing Rejection Fine-Tuning (RFT) with an answer-only loss to enhance reasoning and precise answer formatting. It is optimized for general knowledge benchmarks like MMLU and GPQA, demonstrating improved pass@1 scoring by conditioning on internal reasoning traces.
Loading preview...
Overview
The general_knowledge_model is a 1.7 billion parameter language model developed by CentraleSupéchec for the EPFL CS-552 — Modern NLP course. It is a post-trained version of the Qwen/Qwen3-1.7B base model, specifically designed for closed-book multiple-choice Question Answering (QA) tasks. The model's unique training methodology focuses on improving its ability to reason internally and provide precise, formatted answers.
Key Capabilities and Training
- Rejection Fine-Tuning (RFT): The model was trained using a STaR-style self-distillation approach, leveraging self-generated correct reasoning traces from questions the base model initially failed.
- Answer-Only Loss: A LoRA adapter was fine-tuned with cross-entropy loss masked exclusively to the
\boxed{}answer span. This technique allows the model's internal<think>reasoning block to condition the forward pass without receiving direct gradients, preserving its pretrained reasoning while sharpening answer commitment and output formatting. - Strict Output Format: The model enforces a
\boxed{LETTER}output format and utilizes a 16,384-token reasoning budget, which is crucial for preventing format failures due to truncated reasoning. - Evaluation: Achieves approximately 0.74
pass@1on a 650-question MMLU sweep and 0.59 on an internal 100-question expert set.
When to Use This Model
This model is particularly well-suited for:
- Closed-book multiple-choice QA: Its specialized training makes it effective for tasks requiring precise answers from internal knowledge.
- Applications requiring structured output: The enforced
\boxed{LETTER}format ensures consistent and parseable responses. - Research in reasoning and fine-tuning techniques: Demonstrates an effective application of Rejection Fine-Tuning for enhancing QA performance.