Name: fblgit/TheBeagle-v2beta-32B-MGS API
Brand: Featherless.ai
Price: 25.00 USD
Availability: InStock
Author: fblgit

TheBeagle-v2beta-32B-MGS: An Experimental Qwen-based Model

fblgit's TheBeagle-v2beta-32B-MGS is a 32.8 billion parameter experimental language model built upon the Qwen architecture. This version introduces a novel "MGS" (Many-Geeks-Searching) technique, which acts as a regularization method, operating differently from the established UNA algorithm but remaining compatible. The model was trained for a single epoch, a deliberate choice reflecting the developer's belief in the sufficiency of 1-epoch training.

Key Characteristics & Training:

Architecture: Based on the Qwen model family.
Parameter Count: 32.8 billion parameters.
Context Length: Supports a substantial 131072 tokens.
Training Data: Utilized the Magpie-Align/Magpie-Pro-300K-Filtered dataset, praised for its quality and size.
Training Method: Trained for only one epoch using Axolotl, with a learning rate of 8e-05 and a total batch size of 64.
MGS Technique: Incorporates a unique regularization approach, distinct from UNA, with the cryptic hint "1+1 is 2, and 1+1 is not 3" suggesting a focus on fundamental logical consistency.

Performance & Licensing:

Evaluation Loss: Achieved a validation loss of 0.5378 after one epoch, outperforming a baseline model.
Leaderboard Results: Preliminary evaluations on the Open LLM Leaderboard show an average score of 40.29, with specific scores like IFEval (0-Shot) at 45.03 and BBH (3-Shot) at 58.07.
Licensing: Adheres to Qwen's licensing terms, with an additional requirement for derivatives to include "Beagle" or "MGS" in their model names for tracking purposes.

Overview

TheBeagle-v2beta-32B-MGS: An Experimental Qwen-based Model

Key Characteristics & Training:

Performance & Licensing:

Full Model Card (README)