cs-552-2026-claude-bots/group_model

TEXT GENERATIONConcurrency Cost:1Model Size:2BQuant:BF16Ctx Length:32kTool Calling:SupportedPublished:May 15, 2026Architecture:Transformer Cold

The cs-552-2026-claude-bots/group_model is a merged language model based on Qwen/Qwen3-1.7B, created using the TIES merge method. It integrates specialized capabilities from math, multilingual, and general knowledge models, with a focus on balancing general knowledge and safety while preserving mathematical reasoning and multilingual encoding. This model is designed for applications requiring a robust blend of diverse linguistic and cognitive abilities.

Loading preview...

Overview

This model, cs-552-2026-claude-bots/group_model, is a composite language model built upon the Qwen/Qwen3-1.7B base using the TIES (Task-Independent Ensemble of Specialists) merge method. It combines the strengths of three distinct specialist models: a math specialist, a multilingual specialist, and a general knowledge specialist. The merge configuration was carefully tuned to optimize for a balanced performance across these domains, addressing issues encountered in previous iterations regarding safety and task vector normalization.

Key Capabilities

  • Balanced General Knowledge: Prioritizes general knowledge and safety, with the general knowledge model dominating attention and MLP layers.
  • Preserved Mathematical Reasoning: Maintains strong mathematical accuracy, as the math specialist's MLP density is kept high.
  • Robust Multilingual Support: Ensures multilingual capability through high embedding density for language encoding.
  • Optimized Merge Strategy: Utilizes the TIES method with specific parameter weighting and density filtering for each specialist to prevent performance degradation and ensure harmonious integration of diverse skills.

Good for

  • Applications requiring a versatile model with strong general knowledge and safety features.
  • Tasks that benefit from a combination of mathematical reasoning and broad factual understanding.
  • Multilingual text processing where accurate language encoding is crucial.
  • Developers seeking a merged model that carefully balances multiple specialized capabilities without significant trade-offs.