Overview

This model, developed by the MOMY team for the EPFL CS-552 Modern NLP course, is a 1.7 billion parameter language model based on Qwen/Qwen3-1.7B. It is specifically designed for general-knowledge question answering across domains like science, history, geography, and world affairs.

Key Capabilities

Reasoning-focused: Employs a unique output format where it first reasons step-by-step within <think>...</think> tags before providing a final answer.
Verifiable Answers: Final answers are enclosed in a \boxed{} environment, facilitating automated extraction and verification.
Question Answering: Capable of answering both multiple-choice (e.g., \boxed{B}) and short open-ended factual questions (e.g., \boxed{Paris}).
Training Method: Utilizes GRPO (Group Relative Policy Optimization) with verifiable rewards (RLVR) and LoRA adapters, showing substantial improvement over the base model on knowledge benchmarks.

Training and Performance

The model was trained on a curated dataset including MMLU-Pro (graduate-level multiple-choice) and TriviaQA (open-domain factual QA), totaling 8,100 training examples. Evaluation shows a pass@1 score of 0.44 on the Course CI (knowledge) benchmark, significantly outperforming the Qwen3-1.7B base model's 0.25.

Limitations

Limited Factual Coverage: Due to its 1.7B parameter size, performance on highly specialized or long-tail knowledge is unreliable.
Option-Matching Miscalibration: A common failure mode where the model's reasoning is correct, but it selects a mismatched option if the computed answer is not explicitly listed.
Knowledge Bias: Knowledge is skewed towards English-language and Western-centric sources from its training data.

Overview

Overview

Key Capabilities

Training and Performance

Limitations

Full Model Card (README)