cs-552-2026-catma/general_knowledge_model

TEXT GENERATIONConcurrency Cost:1Model Size:2BQuant:BF16Ctx Length:32kTool Calling:SupportedPublished:May 11, 2026License:apache-2.0Architecture:Transformer Open Weights Cold

The General Knowledge Model, developed by Tuan Dang Nguyen for the CS-552 Modern NLP course project, is a merged post-trained checkpoint based on Qwen/Qwen3-1.7B. This 1.7 billion parameter model is specifically optimized for closed-book multiple-choice general knowledge evaluation. It is designed to answer questions by providing a single option letter within a LaTeX boxed expression, making it highly specialized for benchmark-style assessments.

Loading preview...

General Knowledge Model Overview

This model is a specialized 1.7 billion parameter language model, developed by Tuan Dang Nguyen for the CS-552 Modern NLP course project. It is a post-trained checkpoint based on Qwen/Qwen3-1.7B, specifically engineered for closed-book multiple-choice general knowledge evaluation.

Key Capabilities & Training

The model's primary function is to answer multiple-choice questions by outputting a single letter within a LaTeX boxed expression (e.g., \boxed{B}). Its training involved several stages:

  • Supervised fine-tuning on diverse general-knowledge multiple-choice datasets.
  • Refinements using MMLU-Pro and variable option-count examples.
  • Integration of Quartz v1 SFT for initial strong performance.
  • Stage 5 merge-aware DPO (Direct Preference Optimization) using the model's own incorrect answers for refinement, leading to improved local diagnostic results.

Evaluation & Performance

Evaluations showed consistent performance on both local diagnostic sets and a 10-example public validation snapshot. The final uploaded model achieved 249/290 on local diagnostics and maintained a 0.4900 hidden-CI score, tying the best SFT anchor. DPO improved local diagnostic results and robustness, though it did not surpass the best hidden-CI SFT score.

Usage Notes & Limitations

This model is a fully merged checkpoint, loadable with standard transformers tooling. It is highly specialized for its intended task and not designed as a general chat assistant or a reliable factual oracle outside of benchmark settings. Users should provide closed-book multiple-choice questions and expect a \boxed{LETTER} output format for optimal compatibility with its evaluation pipeline.