Momin-Aldahdouh/MominoMoE-v4

TEXT GENERATIONConcurrency Cost:1Model Size:0.8BQuant:BF16Ctx Length:32kTool Calling:SupportedPublished:Jun 13, 2026License:apache-2.0Architecture:Transformer Open Weights Cold

MominoMoE-v4 by Momin-Aldahdouh is a 0.8 billion parameter Mixture-of-Experts (MoE) language model, fine-tuned from MominoMoE-v3 (based on Qwen3-0.6B). This model underwent a full fine-tuning process, updating all 596 million parameters, rather than using LoRA. It was trained on a coding-first dataset across 15 categories, making it suitable for code-related tasks.

Loading preview...

MominoMoE-v4 Overview

Momin-Aldahdouh's MominoMoE-v4 is a 0.8 billion parameter language model, representing a full fine-tune of the MominoMoE-v3 merged weights, which are based on the Qwen3-0.6B architecture. Unlike LoRA, this iteration involved updating all 596 million parameters directly, indicating a comprehensive modification of the model's weights.

Training Details

The model was trained on a specialized dataset comprising 80,000 training examples and 8,000 validation examples. This dataset is notable for its "coding-first" approach, categorized into 15 distinct areas. The training process spanned 10,000 steps over 4 epochs, utilizing a learning rate of 2e-5 with a cosine schedule and bf16 precision. It achieved a final training loss of 0.1523.

Key Characteristics

  • Architecture: Mixture-of-Experts (MoE) based on Qwen3-0.6B.
  • Parameter Count: 0.8 billion parameters (596 million updated).
  • Training Method: Full fine-tuning, not LoRA.
  • Dataset Focus: Coding-first, with 15 categories.

Potential Use Cases

Given its coding-first training, MominoMoE-v4 is likely well-suited for:

  • Code generation and completion.
  • Code explanation and analysis.
  • Programming-related question answering.
  • Tasks requiring understanding of various coding paradigms.