TheBeagle-v2beta-32B-MGS: An Experimental Qwen-based Model
fblgit's TheBeagle-v2beta-32B-MGS is a 32.8 billion parameter experimental language model built upon the Qwen architecture. This version introduces a novel "MGS" (Many-Geeks-Searching) technique, which acts as a regularization method, operating differently from the established UNA algorithm but remaining compatible. The model was trained for a single epoch, a deliberate choice reflecting the developer's belief in the sufficiency of 1-epoch training.
Key Characteristics & Training:
- Architecture: Based on the Qwen model family.
- Parameter Count: 32.8 billion parameters.
- Context Length: Supports a substantial 131072 tokens.
- Training Data: Utilized the
Magpie-Align/Magpie-Pro-300K-Filtered dataset, praised for its quality and size. - Training Method: Trained for only one epoch using Axolotl, with a learning rate of 8e-05 and a total batch size of 64.
- MGS Technique: Incorporates a unique regularization approach, distinct from UNA, with the cryptic hint "1+1 is 2, and 1+1 is not 3" suggesting a focus on fundamental logical consistency.
Performance & Licensing:
- Evaluation Loss: Achieved a validation loss of 0.5378 after one epoch, outperforming a baseline model.
- Leaderboard Results: Preliminary evaluations on the Open LLM Leaderboard show an average score of 40.29, with specific scores like IFEval (0-Shot) at 45.03 and BBH (3-Shot) at 58.07.
- Licensing: Adheres to Qwen's licensing terms, with an additional requirement for derivatives to include "Beagle" or "MGS" in their model names for tracking purposes.