cs-552-2026-aaty/group_model

TEXT GENERATIONConcurrency Cost:1Model Size:2BQuant:BF16Ctx Length:32kTool Calling:SupportedPublished:May 8, 2026License:apache-2.0Architecture:Transformer Open Weights Cold

The cs-552-2026-aaty/group_model is a 1.7 billion parameter language model developed by team AATY for the CS-552 MNLP course, based on Qwen/Qwen3-1.7B. It has been post-trained using supervised fine-tuning (SFT) and GRPO, with reward functions specifically for math and reasoning objectives. This model is designed for a broad range of tasks, including math, general knowledge, safety, and multilinguality, and is notable for its "thinking mode" which forces the model to emit a reasoning block before providing a final, boxed answer.

Loading preview...

Overview

The cs-552-2026-aaty/group_model is a 1.7 billion parameter language model developed by team AATY for the CS-552 MNLP course at EPFL. It is built upon the Qwen/Qwen3-1.7B base model and has undergone a two-stage post-training process: supervised fine-tuning (SFT) using a LoRA adapter, followed by GRPO (Group Reward Policy Optimization) seeded from the SFT checkpoint. The GRPO phase incorporated specific reward functions tailored for math and reasoning objectives.

Key Capabilities

  • Multi-domain proficiency: Evaluated across four key domains: math (free-form, pass@8), general knowledge, safety, and multilinguality (multiple-choice, pass@1).
  • Structured output: Employs a unique "thinking mode" where it generates a <think>...</think> reasoning block before wrapping its final answer in \boxed{...}. This structured output is enforced by the chat template.
  • Answer format flexibility: Supports both free-form answers (e.g., \boxed{101}) and multiple-choice answers (e.g., \boxed{B}), reflecting its diverse training mix.

Good for

  • Applications requiring explicit reasoning steps before a final answer, particularly in mathematical or logical problem-solving.
  • Tasks that benefit from a model trained and evaluated across a diverse set of domains including general knowledge, safety, and multilingual understanding.
  • Use cases where a smaller, efficiently fine-tuned model (1.7B parameters) can provide competitive performance on specific academic benchmarks, especially those involving structured output and reasoning.