abideen/AlphaMonarch-daser
AlphaMonarch-daser is a 7 billion parameter language model developed by abideen, fine-tuned using a combination of LaserQlora and Dora techniques. This model is a DPO fine-tuned version of mlabonne/NeuralMonarch-7B, utilizing the argilla/OpenHermes2.5-dpo-binarized-alpha preference dataset. It demonstrates improved performance over AlphaMonarch-dora on the YALL leaderboard, despite being trained on only half of the projections. The model is optimized for general language tasks, leveraging its DPO fine-tuning for enhanced conversational and instruction-following capabilities.
Loading preview...
AlphaMonarch-daser Overview
AlphaMonarch-daser is a 7 billion parameter language model developed by abideen, built upon the foundation of mlabonne/NeuralMonarch-7B. This model incorporates a unique blend of two fine-tuning techniques: LaserQlora and Dora.
Key Characteristics & Training
- Fine-tuning Method: The model was fine-tuned using Direct Preference Optimization (DPO) on the argilla/OpenHermes2.5-dpo-binarized-alpha preference dataset.
- Efficiency: Notably, AlphaMonarch-daser achieved better results compared to its predecessor, AlphaMonarch-dora, despite being fine-tuned on only half of the projections.
- Training Steps: The model underwent 1080 training steps with specific hyperparameters including a learning rate of 5e-07 and a cosine learning rate scheduler.
Performance & Evaluation
AlphaMonarch-daser's performance has been evaluated on prominent leaderboards:
- YALL Leaderboard: It ranks superior to AlphaMonarch-dora, AlphaMonarch, and AlphaMonarch-laser.
- OpenLLM Bench: On this benchmark, it performs competitively, ranking above AlphaMonarch-dora but below AlphaMonarch-laser and AlphaMonarch.
This model is suitable for general language generation and instruction-following tasks, benefiting from its DPO fine-tuning on a preference dataset.