Xerv-AI/Ada

TEXT GENERATIONConcurrency Cost:1Model Size:1.5BQuant:BF16Ctx Length:32kPublished:Apr 25, 2026License:apache-2.0Architecture:Transformer0.0K Open Weights Cold

Xerv-AI/Ada is a 1.5 billion parameter multi-modal mathematical generalist Small Language Model (SLM) developed by Xerv-AI, based on the Qwen2.5-Math-1.5B architecture with a 32768 token context length. It is optimized for graduate-level STEM reasoning, logical deduction, and mathematical proofs, while also maintaining general conversational instruction-following capabilities. This model addresses catastrophic forgetting in math-heavy fine-tunes by balancing specialized mathematical prowess with general utility. It is designed for high-speed inference and efficient deployment on low-VRAM consumer hardware.

Loading preview...

Xerv-AI/Ada: The Multi-Modal Mathematical Generalist SLM

Xerv-AI/Ada is an ultra-lightweight, high-speed, and highly optimized 1.5 billion parameter Small Language Model (SLM) derived from the Qwen2.5-Math-1.5B architecture. Developed by Xerv-AI, it uniquely bridges the gap between advanced mathematical reasoning and standard conversational utility, solving the "catastrophic forgetting" problem common in math-heavy fine-tunes. Ada was meticulously engineered using a dual-distribution training dataset to act as both a rigorous STEM assistant and a general-purpose chat model.

Key Capabilities & Strengths

  • Balanced Generalization: Seamlessly transitions between casual conversation and intense analytical problem-solving without format-forced hallucinations.
  • Advanced STEM Reasoning: Optimized to generate detailed, multi-step logical proofs in advanced algebra, calculus, topology, and physics.
  • Hardware Optimized for Edge Deployment: Designed for maximum inference throughput on low-VRAM consumer hardware (e.g., 16GB NVIDIA T4, Mac M-series, edge devices) using 4-bit quantization.
  • Impeccable Formatting: Native understanding of structural formatting, easily outputting highly readable markdown and structured logic steps.

Training Methodology

Ada was trained using Supervised Fine-Tuning (SFT) with QLoRA via Unsloth, leveraging a 50/50 blend of two distinct datasets: the proprietary Xerv-AI/GRAD for deep reasoning and strict formatting, and a refined subset of yahma/alpaca-cleaned for conversational flow and general instruction-following. This dual-distribution blending prevents domain overfitting and catastrophic forgetting.

Performance Summary

  • GSM8K: 40.00%
  • MATH: 60.00%
  • MATH-Hard: 50.00%
  • GRAD: 40.00%

Limitations

  • Arithmetic Hallucinations: Can occasionally suffer from minor arithmetic errors within multi-page proofs; raw calculations should always be verified.
  • Language Constraint: Optimized exclusively for English text and standard mathematical notation.
  • World Knowledge: Lacks the broad encyclopedic knowledge of larger models.