Xerv-AI/Ada
Xerv-AI/Ada is a 1.5 billion parameter multi-modal mathematical generalist Small Language Model (SLM) developed by Xerv-AI, based on the Qwen2.5-Math-1.5B architecture with a 32768 token context length. It is optimized for graduate-level STEM reasoning, logical deduction, and mathematical proofs, while also maintaining general conversational instruction-following capabilities. This model addresses catastrophic forgetting in math-heavy fine-tunes by balancing specialized mathematical prowess with general utility. It is designed for high-speed inference and efficient deployment on low-VRAM consumer hardware.
Loading preview...
Xerv-AI/Ada: The Multi-Modal Mathematical Generalist SLM
Xerv-AI/Ada is an ultra-lightweight, high-speed, and highly optimized 1.5 billion parameter Small Language Model (SLM) derived from the Qwen2.5-Math-1.5B architecture. Developed by Xerv-AI, it uniquely bridges the gap between advanced mathematical reasoning and standard conversational utility, solving the "catastrophic forgetting" problem common in math-heavy fine-tunes. Ada was meticulously engineered using a dual-distribution training dataset to act as both a rigorous STEM assistant and a general-purpose chat model.
Key Capabilities & Strengths
- Balanced Generalization: Seamlessly transitions between casual conversation and intense analytical problem-solving without format-forced hallucinations.
- Advanced STEM Reasoning: Optimized to generate detailed, multi-step logical proofs in advanced algebra, calculus, topology, and physics.
- Hardware Optimized for Edge Deployment: Designed for maximum inference throughput on low-VRAM consumer hardware (e.g., 16GB NVIDIA T4, Mac M-series, edge devices) using 4-bit quantization.
- Impeccable Formatting: Native understanding of structural formatting, easily outputting highly readable markdown and structured logic steps.
Training Methodology
Ada was trained using Supervised Fine-Tuning (SFT) with QLoRA via Unsloth, leveraging a 50/50 blend of two distinct datasets: the proprietary Xerv-AI/GRAD for deep reasoning and strict formatting, and a refined subset of yahma/alpaca-cleaned for conversational flow and general instruction-following. This dual-distribution blending prevents domain overfitting and catastrophic forgetting.
Performance Summary
- GSM8K: 40.00%
- MATH: 60.00%
- MATH-Hard: 50.00%
- GRAD: 40.00%
Limitations
- Arithmetic Hallucinations: Can occasionally suffer from minor arithmetic errors within multi-page proofs; raw calculations should always be verified.
- Language Constraint: Optimized exclusively for English text and standard mathematical notation.
- World Knowledge: Lacks the broad encyclopedic knowledge of larger models.