Overview

The general_knowledge_model is a 1.7 billion parameter language model developed by CentraleSupéchec for the EPFL CS-552 — Modern NLP course. It is a post-trained version of the Qwen/Qwen3-1.7B base model, specifically designed for closed-book multiple-choice Question Answering (QA) tasks. The model's unique training methodology focuses on improving its ability to reason internally and provide precise, formatted answers.

Key Capabilities and Training

Rejection Fine-Tuning (RFT): The model was trained using a STaR-style self-distillation approach, leveraging self-generated correct reasoning traces from questions the base model initially failed.
Answer-Only Loss: A LoRA adapter was fine-tuned with cross-entropy loss masked exclusively to the \boxed{} answer span. This technique allows the model's internal <think> reasoning block to condition the forward pass without receiving direct gradients, preserving its pretrained reasoning while sharpening answer commitment and output formatting.
Strict Output Format: The model enforces a \boxed{LETTER} output format and utilizes a 16,384-token reasoning budget, which is crucial for preventing format failures due to truncated reasoning.
Evaluation: Achieves approximately 0.74 pass@1 on a 650-question MMLU sweep and 0.59 on an internal 100-question expert set.

When to Use This Model

This model is particularly well-suited for:

Closed-book multiple-choice QA: Its specialized training makes it effective for tasks requiring precise answers from internal knowledge.
Applications requiring structured output: The enforced \boxed{LETTER} format ensures consistent and parseable responses.
Research in reasoning and fine-tuning techniques: Demonstrates an effective application of Rejection Fine-Tuning for enhancing QA performance.

Overview

Overview

Key Capabilities and Training

When to Use This Model

Full Model Card (README)