Xerv-AI/Ada
Xerv-AI/Ada is a 1.5 billion parameter Small Language Model (SLM) derived from the Qwen2.5-Math-1.5B architecture, developed by Xerv-AI. It is optimized for graduate-level STEM reasoning, logical deduction, and mathematical proofs, while also maintaining general conversational instruction-following capabilities. This model is designed for high-speed inference on low-VRAM consumer hardware, effectively bridging specialized mathematical tasks with standard utility without catastrophic forgetting.
Loading preview...
Xerv-AI/Ada: Multi-Modal Mathematical Generalist SLM
Xerv-AI/Ada is a 1.5 billion parameter Small Language Model (SLM) built upon the Qwen2.5-Math-1.5B architecture. It uniquely addresses the "catastrophic forgetting" problem common in math-heavy fine-tunes by balancing advanced STEM reasoning with general conversational utility.
Core Capabilities & Strengths
- Balanced Generalization: Seamlessly handles both casual conversation and complex analytical problem-solving.
- Advanced STEM Reasoning: Generates detailed, multi-step logical proofs in advanced algebra, calculus, topology, and physics.
- Hardware Optimized: Designed for maximum inference throughput on low-VRAM consumer hardware (e.g., 16GB NVIDIA T4, Mac M-series) using 4-bit quantization.
- Impeccable Formatting: Understands structural formatting for highly readable markdown and structured logic steps.
Training Methodology
Ada was fine-tuned using Supervised Fine-Tuning (SFT) with QLoRA via Unsloth, targeting attention mechanisms. It utilized a carefully balanced 50/50 blend of two distinct datasets:
- Xerv-AI/GRAD: ~1.93k rows of proprietary synthetic graduate and research-level mathematical proofs (average 8,000 characters) to instill deep reasoning and strict formatting.
- yahma/alpaca-cleaned: ~2.00k rows of a refined Alpaca subset for conversational flow, roleplay, and basic Q&A.
Performance Summary
- GSM8K: 40.00%
- MATH: 60.00%
- MATH-Hard: 50.00%
- GRAD: 40.00%
Limitations
- Arithmetic Hallucinations: May occasionally make minor arithmetic errors within multi-page proofs; raw calculations should be verified.
- Language Constraint: Optimized exclusively for English text and standard mathematical notation.
- Prompt Sensitivity: Performs best when math queries explicitly ask for "proof," "step-by-step breakdown," or "logical analysis."
- World Knowledge: Lacks the broad encyclopedic knowledge of larger models.