fblgit/cybertron-v4-qw7B-MGS

TEXT GENERATIONConcurrency Cost:1Model Size:7.6BQuant:FP8Ctx Length:32kPublished:Oct 29, 2024License:qwenArchitecture:Transformer0.0K Cold

The fblgit/cybertron-v4-qw7B-MGS is a 7.6 billion parameter causal language model developed by fblgit, based on the Qwen2.5 architecture with a 131072 token context length. This model utilizes a novel 'MGS' approach to address corpora forgetfulness, distinguishing it from other LLMs. It is fine-tuned via Supervised Fine-Tuning (SFT) on the Magpie-Align/Magpie-Qwen2.5-Pro-1M-v0.1 dataset, making it suitable for general language understanding and generation tasks where robust performance in its class is desired.

Loading preview...

Overview

fblgit/cybertron-v4-qw7B-MGS is a 7.6 billion parameter language model built upon the Qwen2.5 architecture, featuring an extensive context length of 131072 tokens. Developed by fblgit, this model incorporates a proprietary 'MGS' approach, which is described as a strategy for tackling corpora forgetfulness, enhancing its ability to retain and utilize information over time. It was trained using Supervised Fine-Tuning (SFT) on the Magpie-Align/Magpie-Qwen2.5-Pro-1M-v0.1 dataset.

Key Capabilities & Performance

  • MGS Approach: Integrates a novel 'MGS' strategy, detailed in an associated arXiv paper, aimed at mitigating corpora forgetfulness.
  • Strong Performance: Achieved an average score of 31.21 on the Open LLM Leaderboard, with notable results including 62.64 on IFEval (0-Shot) and 38.59 on MMLU-PRO (5-shot).
  • Training: Underwent a single epoch of SFT with specific hyperparameters, including a total training batch size of 128 and an Adam optimizer.

Good For

  • General Language Tasks: Its foundation on Qwen2.5 and SFT training make it suitable for a broad range of natural language processing applications.
  • Applications Requiring Robust Information Retention: The 'MGS' approach suggests potential benefits for use cases where models typically struggle with forgetting previously learned information.
  • Benchmarking and Research: Given its competitive performance on the Open LLM Leaderboard, it serves as a strong candidate for comparative studies and further research into model fine-tuning and forgetfulness mitigation.