cs-552-2026-catma/general_knowledge_model
The General Knowledge Model, developed by Tuan Dang Nguyen for the CS-552 Modern NLP course project, is a merged post-trained checkpoint based on Qwen/Qwen3-1.7B. This 1.7 billion parameter model is specifically optimized for closed-book multiple-choice general knowledge evaluation. It is designed to answer questions by providing a single option letter within a LaTeX boxed expression, making it highly specialized for benchmark-style assessments.
Loading preview...
General Knowledge Model Overview
This model is a specialized 1.7 billion parameter language model, developed by Tuan Dang Nguyen for the CS-552 Modern NLP course project. It is a post-trained checkpoint based on Qwen/Qwen3-1.7B, specifically engineered for closed-book multiple-choice general knowledge evaluation.
Key Capabilities & Training
The model's primary function is to answer multiple-choice questions by outputting a single letter within a LaTeX boxed expression (e.g., \boxed{B}). Its training involved several stages:
- Supervised fine-tuning on diverse general-knowledge multiple-choice datasets.
- Refinements using MMLU-Pro and variable option-count examples.
- Integration of Quartz v1 SFT for initial strong performance.
- Stage 5 merge-aware DPO (Direct Preference Optimization) using the model's own incorrect answers for refinement, leading to improved local diagnostic results.
Evaluation & Performance
Evaluations showed consistent performance on both local diagnostic sets and a 10-example public validation snapshot. The final uploaded model achieved 249/290 on local diagnostics and maintained a 0.4900 hidden-CI score, tying the best SFT anchor. DPO improved local diagnostic results and robustness, though it did not surpass the best hidden-CI SFT score.
Usage Notes & Limitations
This model is a fully merged checkpoint, loadable with standard transformers tooling. It is highly specialized for its intended task and not designed as a general chat assistant or a reliable factual oracle outside of benchmark settings. Users should provide closed-book multiple-choice questions and expect a \boxed{LETTER} output format for optimal compatibility with its evaluation pipeline.