libvm/mm-cand-aim_on_task_arithmetic

Hugging Face
TEXT GENERATIONConcurrency Cost:1Model Size:8BQuant:FP8Ctx Length:32kPublished:May 19, 2026Architecture:Transformer Warm

The libvm/mm-cand-aim_on_task_arithmetic model is an 8 billion parameter language model with a 32768 token context length, created by libvm. It was developed using a task arithmetic merge of Qwen3-8B-Base, Qwen3-8B, OpenDataArena/Qwen3-8B-ODA-Math-460k, and mlabonne/Qwen3-8B-abliterated, followed by AIM application. This model is specifically optimized for arithmetic and mathematical reasoning tasks, leveraging diverse training data including mathematical datasets.

Loading preview...

Model Overview

The libvm/mm-cand-aim_on_task_arithmetic is an 8 billion parameter language model with a 32768 token context window, developed by libvm. Its creation involved a multi-stage process, beginning with a task arithmetic merge of several base models from the Qwen3-8B family, specifically Qwen/Qwen3-8B-Base, Qwen/Qwen3-8B, OpenDataArena/Qwen3-8B-ODA-Math-460k, and mlabonne/Qwen3-8B-abliterated. This merging technique, known as task arithmetic, combines the strengths of its constituent models.

Following the initial merge, the model underwent further refinement through the application of AIM (Adaptive Instance Normalization) using a diverse set of calibration examples. These examples were sourced from HuggingFaceFW/fineweb-edu (a 10 billion token sample), allenai/WildChat, open-web-math/open-web-math, and allenai/wildjailbreak (training split). This targeted training approach aims to enhance its performance on specific tasks.

Key Capabilities

  • Arithmetic and Mathematical Reasoning: The model's lineage, particularly the inclusion of OpenDataArena/Qwen3-8B-ODA-Math-460k and training on open-web-math/open-web-math, suggests a strong focus on numerical and mathematical problem-solving.
  • Merged Architecture Benefits: The task arithmetic merging method allows for the combination of different specialized models, potentially leading to a more robust and versatile model than its individual components.
  • Extensive Context Window: A 32768 token context length enables the model to process and understand longer inputs, which is beneficial for complex problem descriptions or multi-step reasoning tasks.

Good For

  • Mathematical Applications: Ideal for use cases requiring precise arithmetic calculations, algebraic problem-solving, or other quantitative tasks.
  • Reasoning-Intensive Workloads: Suitable for scenarios where logical deduction and step-by-step reasoning are critical.
  • Research and Development: Provides a strong base for further fine-tuning on specific mathematical or reasoning-focused datasets.