abhishekchohan/SOLAR-10.7B-Instruct-Forest-DPO-v1
TEXT GENERATIONConcurrency Cost:1Model Size:10.7BQuant:FP8Ctx Length:4kPublished:Feb 15, 2024License:apache-2.0Architecture:Transformer0.0K Open Weights Cold

abhishekchohan/SOLAR-10.7B-Instruct-Forest-DPO-v1 is a 10.7 billion parameter instruction-tuned language model developed by abhishekchohan, fine-tuned from upstage/SOLAR-10.7B-Instruct-v1.0. This model utilizes direct preference optimization (DPO) and is designed for a wide range of natural language processing tasks. It leverages a 4096-token context length and is particularly adept at generating human-like text responses.

Loading preview...

Overview

abhishekchohan/SOLAR-10.7B-Instruct-Forest-DPO-v1 is a 10.7 billion parameter instruction-tuned language model. It is built upon the upstage/SOLAR-10.7B-Instruct-v1.0 base model and has been further refined using direct preference optimization (DPO). This fine-tuning process aims to align the model's outputs more closely with human preferences, enhancing its performance across various natural language processing tasks.

Key Capabilities

  • Instruction Following: Designed to accurately follow user instructions for text generation.
  • Natural Language Processing: Exhibits strong performance across a spectrum of NLP tasks.
  • Preference Alignment: Benefits from DPO fine-tuning, leading to more preferred and coherent responses.

Training Details

The model was fine-tuned using a mixture of high-quality datasets, including:

  • Intel/orca_dpo_pairs
  • nvidia/HelpSteer
  • jondurbin/truthy-dpo-v0.1

Good For

  • Generating human-like text based on prompts.
  • Applications requiring robust instruction-following capabilities.
  • Tasks where preference-aligned outputs are crucial.