bhavinjawade/SOLAR-10B-OrcaDPO-Jawade

Warm
Public
10.7B
FP8
4096
License: mit
Hugging Face
Overview

SOLAR-10B-OrcaDPO-Jawade Overview

This model, developed by bhavinjawade, is an instruction-tuned version of the upstage/SOLAR-10.7B-Instruct-v1.0 base model, featuring 10.7 billion parameters. It was fine-tuned using Low-Rank Adaptation (LoRA) on the Intel DPO Orca dataset, which consists of DPO (Direct Preference Optimization) pairs. The original SOLAR-10.7B paper noted that its alignment was also based on Intel ORCA DPO pairs.

Key Capabilities

  • Enhanced Instruction Following: Optimized for understanding and responding to user instructions effectively.
  • Improved Performance: Demonstrates slight (less than 1%) improvements on OpenLLM Leaderboard benchmarks compared to SOLAR 10.7B-Instruct, and significant improvements over the base SOLAR 10.7B model.
  • Conversational AI: Suitable for chatbot applications, capable of generating coherent and contextually relevant responses.

Training Details

The model leverages LoRA for efficient fine-tuning, building upon the robust architecture of the SOLAR-10.7B series. The use of the Intel/orca_dpo_pairs dataset specifically targets improved alignment and conversational quality.

License

This model is released under the MIT License, permitting broad reuse, modification, and distribution for both private and commercial purposes.