Name: fblgit/cybertron-v4-qw7B-MGS API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: fblgit

Overview

fblgit/cybertron-v4-qw7B-MGS is a 7.6 billion parameter language model built upon the Qwen2.5 architecture, featuring an extensive context length of 131072 tokens. Developed by fblgit, this model incorporates a proprietary 'MGS' approach, which is described as a strategy for tackling corpora forgetfulness, enhancing its ability to retain and utilize information over time. It was trained using Supervised Fine-Tuning (SFT) on the Magpie-Align/Magpie-Qwen2.5-Pro-1M-v0.1 dataset.

Key Capabilities & Performance

MGS Approach: Integrates a novel 'MGS' strategy, detailed in an associated arXiv paper, aimed at mitigating corpora forgetfulness.
Strong Performance: Achieved an average score of 31.21 on the Open LLM Leaderboard, with notable results including 62.64 on IFEval (0-Shot) and 38.59 on MMLU-PRO (5-shot).
Training: Underwent a single epoch of SFT with specific hyperparameters, including a total training batch size of 128 and an Adam optimizer.

Good For

General Language Tasks: Its foundation on Qwen2.5 and SFT training make it suitable for a broad range of natural language processing applications.
Applications Requiring Robust Information Retention: The 'MGS' approach suggests potential benefits for use cases where models typically struggle with forgetting previously learned information.
Benchmarking and Research: Given its competitive performance on the Open LLM Leaderboard, it serves as a strong candidate for comparative studies and further research into model fine-tuning and forgetfulness mitigation.

Overview

Overview

Key Capabilities & Performance

Good For

Full Model Card (README)