SOLAR-10B-OrcaDPO-Jawade Overview
This model, developed by bhavinjawade, is an instruction-tuned version of the upstage/SOLAR-10.7B-Instruct-v1.0 base model, featuring 10.7 billion parameters. It was fine-tuned using Low-Rank Adaptation (LoRA) on the Intel DPO Orca dataset, which consists of DPO (Direct Preference Optimization) pairs. The original SOLAR-10.7B paper noted that its alignment was also based on Intel ORCA DPO pairs.
Key Capabilities
- Enhanced Instruction Following: Optimized for understanding and responding to user instructions effectively.
- Improved Performance: Demonstrates slight (less than 1%) improvements on OpenLLM Leaderboard benchmarks compared to
SOLAR 10.7B-Instruct, and significant improvements over the baseSOLAR 10.7Bmodel. - Conversational AI: Suitable for chatbot applications, capable of generating coherent and contextually relevant responses.
Training Details
The model leverages LoRA for efficient fine-tuning, building upon the robust architecture of the SOLAR-10.7B series. The use of the Intel/orca_dpo_pairs dataset specifically targets improved alignment and conversational quality.
License
This model is released under the MIT License, permitting broad reuse, modification, and distribution for both private and commercial purposes.