bhavinjawade/SOLAR-10B-OrcaDPO-Jawade

Warm
Public
10.7B
FP8
4096
1
Jan 6, 2024
License: mit
Hugging Face

bhavinjawade/SOLAR-10B-OrcaDPO-Jawade is a 10.7 billion parameter instruction-tuned causal language model, fine-tuned by bhavinjawade from Upstage's SOLAR-10.7B-Instruct-v1.0. It was trained using LoRA on the Intel DPO Orca dataset, showing slight performance improvements on OpenLLM Leaderboard benchmarks compared to its base model. This model is optimized for general instruction following tasks, offering enhanced conversational capabilities.

Overview

SOLAR-10B-OrcaDPO-Jawade Overview

This model, developed by bhavinjawade, is an instruction-tuned version of the upstage/SOLAR-10.7B-Instruct-v1.0 base model, featuring 10.7 billion parameters. It was fine-tuned using Low-Rank Adaptation (LoRA) on the Intel DPO Orca dataset, which consists of DPO (Direct Preference Optimization) pairs. The original SOLAR-10.7B paper noted that its alignment was also based on Intel ORCA DPO pairs.

Key Capabilities

  • Enhanced Instruction Following: Optimized for understanding and responding to user instructions effectively.
  • Improved Performance: Demonstrates slight (less than 1%) improvements on OpenLLM Leaderboard benchmarks compared to SOLAR 10.7B-Instruct, and significant improvements over the base SOLAR 10.7B model.
  • Conversational AI: Suitable for chatbot applications, capable of generating coherent and contextually relevant responses.

Training Details

The model leverages LoRA for efficient fine-tuning, building upon the robust architecture of the SOLAR-10.7B series. The use of the Intel/orca_dpo_pairs dataset specifically targets improved alignment and conversational quality.

License

This model is released under the MIT License, permitting broad reuse, modification, and distribution for both private and commercial purposes.