Eric111/SOLAR-10.7B-Instruct-v1.0-DPO

Hugging Face
TEXT GENERATIONConcurrency Cost:1Model Size:10.7BQuant:FP8Ctx Length:4kPublished:Mar 1, 2024License:apache-2.0Architecture:Transformer Open Weights Warm

Eric111/SOLAR-10.7B-Instruct-v1.0-DPO is a 10.7 billion parameter instruction-tuned language model, fine-tuned using DPO (Direct Preference Optimization). It is based on the upstage/SOLAR-10.7B-Instruct-v1.0 architecture and further optimized with the Intel/orca_dpo_pairs dataset. This model is designed for general instruction following tasks, leveraging its DPO fine-tuning for improved response quality and alignment.

Loading preview...

Model Overview

Eric111/SOLAR-10.7B-Instruct-v1.0-DPO is a 10.7 billion parameter instruction-tuned model. It is a DPO (Direct Preference Optimization) fine-tuned version of the base model upstage/SOLAR-10.7B-Instruct-v1.0. The fine-tuning process utilized the Intel/orca_dpo_pairs dataset, aiming to enhance the model's ability to follow instructions and generate preferred responses.

Key Characteristics

  • Parameter Count: 10.7 billion parameters, offering a balance between performance and computational efficiency.
  • Fine-tuning Method: Employs Direct Preference Optimization (DPO) for improved alignment with human preferences and instruction following.
  • Base Model: Built upon the upstage/SOLAR-10.7B-Instruct-v1.0 architecture.
  • Training Data: Fine-tuned using the Intel/orca_dpo_pairs dataset, which is designed to improve instruction-following capabilities.

Potential Use Cases

This model is suitable for a variety of general-purpose instruction-following tasks, including:

  • Generating text based on specific prompts.
  • Answering questions.
  • Summarization.
  • Creative writing assistance.

Limitations

The provided model card indicates that more information is needed regarding specific biases, risks, and limitations. Users should exercise caution and conduct their own evaluations for critical applications.

Popular Sampler Settings

Top 3 parameter combinations used by Featherless users for this model. Click a tab to see each config.

temperature
top_p
top_k
frequency_penalty
presence_penalty
repetition_penalty
min_p