cs-552-2026-aaty/general_knowledge_model
The general_knowledge_model by cs-552-2026-aaty is a supervised fine-tuned Qwen3-1.7B model, optimized for answering closed-book factual and reasoning questions across sciences, humanities, and geography. This 1.7 billion parameter model excels at multiple-choice questions, providing reasoning before boxing the final answer. It is specifically designed for general knowledge tasks, operating in a forced thinking mode to enhance response quality.
Loading preview...
Model Overview
The general_knowledge_model is a specialized language model developed by the AATY team for the CS-552 MNLP course at EPFL. It is built upon the Qwen/Qwen3-1.7B base model, which has undergone supervised fine-tuning to excel in general knowledge domains.
Key Capabilities
- Closed-book QA: Designed to answer factual and reasoning questions without external information.
- Domain Expertise: Proficient in subjects spanning sciences, humanities, and geography.
- Multiple-choice Format: Optimized for multiple-choice questions, supporting 2 to 20 options.
- Reasoning Output: Emits a detailed reasoning block (
<think>...</think>) before providing the final answer, which is wrapped in\boxed{...}. - Thinking Mode: The model is configured to always operate in a "thinking mode" via its chat template, ensuring a structured reasoning process for every query.
Training and Usage
The model was fine-tuned using a LoRA adapter on cs-552-2026-aaty/sft_mixture, a chat-formatted dataset derived from public QA and knowledge sources. It is provided as vLLM-loadable safetensors, including a config.json, generation_config.json, and a tokenizer chat_template. Developers can easily integrate it using the transformers library for tasks requiring robust general knowledge inference.