Xerv-AI/Ada

Hugging Face
TEXT GENERATIONConcurrency Cost:1Model Size:1.5BQuant:BF16Ctx Length:32kPublished:Apr 25, 2026License:apache-2.0Architecture:Transformer0.0K Open Weights Warm

Xerv-AI/Ada is a 1.5 billion parameter Small Language Model (SLM) derived from the Qwen2.5-Math-1.5B architecture, developed by Xerv-AI. It is optimized for graduate-level STEM reasoning, logical deduction, and mathematical proofs, while also maintaining general conversational instruction-following capabilities. This model is designed for high-speed inference on low-VRAM consumer hardware, effectively bridging specialized mathematical tasks with standard utility without catastrophic forgetting.

Loading preview...

Xerv-AI/Ada: Multi-Modal Mathematical Generalist SLM

Xerv-AI/Ada is a 1.5 billion parameter Small Language Model (SLM) built upon the Qwen2.5-Math-1.5B architecture. It uniquely addresses the "catastrophic forgetting" problem common in math-heavy fine-tunes by balancing advanced STEM reasoning with general conversational utility.

Core Capabilities & Strengths

  • Balanced Generalization: Seamlessly handles both casual conversation and complex analytical problem-solving.
  • Advanced STEM Reasoning: Generates detailed, multi-step logical proofs in advanced algebra, calculus, topology, and physics.
  • Hardware Optimized: Designed for maximum inference throughput on low-VRAM consumer hardware (e.g., 16GB NVIDIA T4, Mac M-series) using 4-bit quantization.
  • Impeccable Formatting: Understands structural formatting for highly readable markdown and structured logic steps.

Training Methodology

Ada was fine-tuned using Supervised Fine-Tuning (SFT) with QLoRA via Unsloth, targeting attention mechanisms. It utilized a carefully balanced 50/50 blend of two distinct datasets:

  • Xerv-AI/GRAD: ~1.93k rows of proprietary synthetic graduate and research-level mathematical proofs (average 8,000 characters) to instill deep reasoning and strict formatting.
  • yahma/alpaca-cleaned: ~2.00k rows of a refined Alpaca subset for conversational flow, roleplay, and basic Q&A.

Performance Summary

  • GSM8K: 40.00%
  • MATH: 60.00%
  • MATH-Hard: 50.00%
  • GRAD: 40.00%

Limitations

  • Arithmetic Hallucinations: May occasionally make minor arithmetic errors within multi-page proofs; raw calculations should be verified.
  • Language Constraint: Optimized exclusively for English text and standard mathematical notation.
  • Prompt Sensitivity: Performs best when math queries explicitly ask for "proof," "step-by-step breakdown," or "logical analysis."
  • World Knowledge: Lacks the broad encyclopedic knowledge of larger models.