Cybertron v4 UNA-MGS: Enhanced Qwen2.5 7B Model
Cybertron v4 UNA-MGS is a 7.6 billion parameter language model built upon the Qwen2.5 7B architecture, developed by fblgit. This model introduces two key innovations: MGS (a novel approach) and UNA (Uniform Neural Alignment), applied specifically at the MLP layers. These techniques aim to improve the model's alignment and reduce contamination, as evidenced by its performance on various benchmarks.
Key Capabilities & Performance
- High Performance: Achieved an average score of 31.82 on the Open LLM Leaderboard, ranking as a top 7-8B LLM without contamination as of November 21, 2024.
- Reduced Contamination: Benchmarked against Qwen2.5-7B-Instruct and Homer-v0.5, Cybertron v4 shows comparable or slightly improved contamination scores on MATH datasets, indicating effective alignment.
- Specialized Training: Fine-tuned for one epoch using the
Magpie-Align/Magpie-Qwen2.5-Pro-1M-v0.1 dataset, integrating both MGS and UNA during the Supervised Fine-Tuning (SFT) process. - Context Length: Supports a substantial context length of 131072 tokens.
Good For
- Applications requiring robust alignment: The UNA and MGS methodologies are designed to enhance the model's ability to provide coherent and logically sound responses.
- Use cases sensitive to benchmark contamination: The model's focus on reduced contamination makes it suitable for tasks where data leakage from benchmarks is a concern.
- Developers seeking a high-performing 7-8B model: Its strong leaderboard scores suggest it's a competitive option for various general-purpose language tasks within its size class.