Momin-Aldahdouh/MominoMoE-v4
MominoMoE-v4 by Momin-Aldahdouh is a 0.8 billion parameter Mixture-of-Experts (MoE) language model, fine-tuned from MominoMoE-v3 (based on Qwen3-0.6B). This model underwent a full fine-tuning process, updating all 596 million parameters, rather than using LoRA. It was trained on a coding-first dataset across 15 categories, making it suitable for code-related tasks.
Loading preview...
MominoMoE-v4 Overview
Momin-Aldahdouh's MominoMoE-v4 is a 0.8 billion parameter language model, representing a full fine-tune of the MominoMoE-v3 merged weights, which are based on the Qwen3-0.6B architecture. Unlike LoRA, this iteration involved updating all 596 million parameters directly, indicating a comprehensive modification of the model's weights.
Training Details
The model was trained on a specialized dataset comprising 80,000 training examples and 8,000 validation examples. This dataset is notable for its "coding-first" approach, categorized into 15 distinct areas. The training process spanned 10,000 steps over 4 epochs, utilizing a learning rate of 2e-5 with a cosine schedule and bf16 precision. It achieved a final training loss of 0.1523.
Key Characteristics
- Architecture: Mixture-of-Experts (MoE) based on Qwen3-0.6B.
- Parameter Count: 0.8 billion parameters (596 million updated).
- Training Method: Full fine-tuning, not LoRA.
- Dataset Focus: Coding-first, with 15 categories.
Potential Use Cases
Given its coding-first training, MominoMoE-v4 is likely well-suited for:
- Code generation and completion.
- Code explanation and analysis.
- Programming-related question answering.
- Tasks requiring understanding of various coding paradigms.