cs-552-2026-centralesupechec/general_knowledge_model

TEXT GENERATIONConcurrency Cost:1Model Size:2BQuant:BF16Ctx Length:32kTool Calling:SupportedPublished:May 6, 2026License:apache-2.0Architecture:Transformer Open Weights Cold

The cs-552-2026-centralesupechec/general_knowledge_model is a 1.7 billion parameter language model, post-trained from Qwen/Qwen3-1.7B by CentraleSupéchec for the EPFL CS-552 — Modern NLP course. This model specializes in closed-book multiple-choice QA, utilizing Rejection Fine-Tuning (RFT) with an answer-only loss to enhance reasoning and precise answer formatting. It is optimized for general knowledge benchmarks like MMLU and GPQA, demonstrating improved pass@1 scoring by conditioning on internal reasoning traces.

Loading preview...

Overview

The general_knowledge_model is a 1.7 billion parameter language model developed by CentraleSupéchec for the EPFL CS-552 — Modern NLP course. It is a post-trained version of the Qwen/Qwen3-1.7B base model, specifically designed for closed-book multiple-choice Question Answering (QA) tasks. The model's unique training methodology focuses on improving its ability to reason internally and provide precise, formatted answers.

Key Capabilities and Training

  • Rejection Fine-Tuning (RFT): The model was trained using a STaR-style self-distillation approach, leveraging self-generated correct reasoning traces from questions the base model initially failed.
  • Answer-Only Loss: A LoRA adapter was fine-tuned with cross-entropy loss masked exclusively to the \boxed{} answer span. This technique allows the model's internal <think> reasoning block to condition the forward pass without receiving direct gradients, preserving its pretrained reasoning while sharpening answer commitment and output formatting.
  • Strict Output Format: The model enforces a \boxed{LETTER} output format and utilizes a 16,384-token reasoning budget, which is crucial for preventing format failures due to truncated reasoning.
  • Evaluation: Achieves approximately 0.74 pass@1 on a 650-question MMLU sweep and 0.59 on an internal 100-question expert set.

When to Use This Model

This model is particularly well-suited for:

  • Closed-book multiple-choice QA: Its specialized training makes it effective for tasks requiring precise answers from internal knowledge.
  • Applications requiring structured output: The enforced \boxed{LETTER} format ensures consistent and parseable responses.
  • Research in reasoning and fine-tuning techniques: Demonstrates an effective application of Rejection Fine-Tuning for enhancing QA performance.