fblgit/TheBeagle-v2beta-32B-MGS

Hugging Face
TEXT GENERATIONConcurrency Cost:2Model Size:32.8BQuant:FP8Ctx Length:32kPublished:Oct 20, 2024License:qwenArchitecture:Transformer0.0K Warm

TheBeagle-v2beta-32B-MGS is a 32.8 billion parameter experimental language model developed by fblgit, based on the Qwen architecture. This model incorporates a unique "MGS" (Many-Geeks-Searching) technique, distinct from the UNA algorithm, focusing on regularization. Trained for a single epoch on the Magpie-Align/Magpie-Pro-300K-Filtered dataset, it aims to explore novel training methodologies and achieve competitive performance with a 131072 token context length.

Loading preview...

TheBeagle-v2beta-32B-MGS: An Experimental Qwen-based Model

fblgit's TheBeagle-v2beta-32B-MGS is a 32.8 billion parameter experimental language model built upon the Qwen architecture. This version introduces a novel "MGS" (Many-Geeks-Searching) technique, which acts as a regularization method, operating differently from the established UNA algorithm but remaining compatible. The model was trained for a single epoch, a deliberate choice reflecting the developer's belief in the sufficiency of 1-epoch training.

Key Characteristics & Training:

  • Architecture: Based on the Qwen model family.
  • Parameter Count: 32.8 billion parameters.
  • Context Length: Supports a substantial 131072 tokens.
  • Training Data: Utilized the Magpie-Align/Magpie-Pro-300K-Filtered dataset, praised for its quality and size.
  • Training Method: Trained for only one epoch using Axolotl, with a learning rate of 8e-05 and a total batch size of 64.
  • MGS Technique: Incorporates a unique regularization approach, distinct from UNA, with the cryptic hint "1+1 is 2, and 1+1 is not 3" suggesting a focus on fundamental logical consistency.

Performance & Licensing:

  • Evaluation Loss: Achieved a validation loss of 0.5378 after one epoch, outperforming a baseline model.
  • Leaderboard Results: Preliminary evaluations on the Open LLM Leaderboard show an average score of 40.29, with specific scores like IFEval (0-Shot) at 45.03 and BBH (3-Shot) at 58.07.
  • Licensing: Adheres to Qwen's licensing terms, with an additional requirement for derivatives to include "Beagle" or "MGS" in their model names for tracking purposes.