cs-552-2026-momy/general_knowledge_model

TEXT GENERATIONConcurrency Cost:1Model Size:2BQuant:BF16Ctx Length:32kTool Calling:SupportedPublished:May 11, 2026License:apache-2.0Architecture:Transformer Open Weights Cold

The cs-552-2026-momy/general_knowledge_model is a 1.7 billion parameter Qwen3-based language model developed by MOMY for the EPFL CS-552 course. It is specifically post-trained using GRPO with verifiable rewards (RLVR) for general-knowledge question answering. This model excels at providing step-by-step reasoning within tags and outputting final answers in a \boxed{} format, making it suitable for automated answer extraction and verification in science, history, and geography domains.

Loading preview...

Overview

This model, developed by the MOMY team for the EPFL CS-552 Modern NLP course, is a 1.7 billion parameter language model based on Qwen/Qwen3-1.7B. It is specifically designed for general-knowledge question answering across domains like science, history, geography, and world affairs.

Key Capabilities

  • Reasoning-focused: Employs a unique output format where it first reasons step-by-step within <think>...</think> tags before providing a final answer.
  • Verifiable Answers: Final answers are enclosed in a \boxed{} environment, facilitating automated extraction and verification.
  • Question Answering: Capable of answering both multiple-choice (e.g., \boxed{B}) and short open-ended factual questions (e.g., \boxed{Paris}).
  • Training Method: Utilizes GRPO (Group Relative Policy Optimization) with verifiable rewards (RLVR) and LoRA adapters, showing substantial improvement over the base model on knowledge benchmarks.

Training and Performance

The model was trained on a curated dataset including MMLU-Pro (graduate-level multiple-choice) and TriviaQA (open-domain factual QA), totaling 8,100 training examples. Evaluation shows a pass@1 score of 0.44 on the Course CI (knowledge) benchmark, significantly outperforming the Qwen3-1.7B base model's 0.25.

Limitations

  • Limited Factual Coverage: Due to its 1.7B parameter size, performance on highly specialized or long-tail knowledge is unreliable.
  • Option-Matching Miscalibration: A common failure mode where the model's reasoning is correct, but it selects a mismatched option if the computed answer is not explicitly listed.
  • Knowledge Bias: Knowledge is skewed towards English-language and Western-centric sources from its training data.