cs-552-2026-aaty/general_knowledge_model

TEXT GENERATIONConcurrency Cost:1Model Size:2BQuant:BF16Ctx Length:32kTool Calling:SupportedPublished:May 8, 2026License:apache-2.0Architecture:Transformer Open Weights Cold

The general_knowledge_model by cs-552-2026-aaty is a supervised fine-tuned Qwen3-1.7B model, optimized for answering closed-book factual and reasoning questions across sciences, humanities, and geography. This 1.7 billion parameter model excels at multiple-choice questions, providing reasoning before boxing the final answer. It is specifically designed for general knowledge tasks, operating in a forced thinking mode to enhance response quality.

Loading preview...

Model Overview

The general_knowledge_model is a specialized language model developed by the AATY team for the CS-552 MNLP course at EPFL. It is built upon the Qwen/Qwen3-1.7B base model, which has undergone supervised fine-tuning to excel in general knowledge domains.

Key Capabilities

  • Closed-book QA: Designed to answer factual and reasoning questions without external information.
  • Domain Expertise: Proficient in subjects spanning sciences, humanities, and geography.
  • Multiple-choice Format: Optimized for multiple-choice questions, supporting 2 to 20 options.
  • Reasoning Output: Emits a detailed reasoning block (<think>...</think>) before providing the final answer, which is wrapped in \boxed{...}.
  • Thinking Mode: The model is configured to always operate in a "thinking mode" via its chat template, ensuring a structured reasoning process for every query.

Training and Usage

The model was fine-tuned using a LoRA adapter on cs-552-2026-aaty/sft_mixture, a chat-formatted dataset derived from public QA and knowledge sources. It is provided as vLLM-loadable safetensors, including a config.json, generation_config.json, and a tokenizer chat_template. Developers can easily integrate it using the transformers library for tasks requiring robust general knowledge inference.