etri-xainlp/llama2-13b-lima-sft-dpo

Hugging Face
TEXT GENERATIONConcurrency Cost:1Model Size:13BQuant:FP8Ctx Length:4kLicense:apache-2.0Architecture:Transformer Open Weights Warm

The etri-xainlp/llama2-13b-lima-sft-dpo model is a 13 billion parameter language model developed by the ETRI xainlp team, built upon the Llama-2-13b-hf base architecture. This model is specifically fine-tuned using a combination of supervised fine-tuning (SFT) with 650k instruction-following examples, LIMA SFT with 280k examples, and further optimized with Direct Preference Optimization (DPO) using 90k user preference sets. It is designed to excel in instruction-following tasks, making it suitable for applications requiring precise and aligned responses.

Loading preview...

Model Overview

The etri-xainlp/llama2-13b-lima-sft-dpo is a 13 billion parameter language model developed by the ETRI xainlp team. It is based on the robust meta-llama/Llama-2-13b-hf architecture, enhanced through a multi-stage fine-tuning process to improve its instruction-following capabilities and alignment with user preferences.

Key Capabilities

  • Instruction Following: The model has undergone extensive supervised fine-tuning (SFT) with a large dataset of 650,000 instruction-following examples, ensuring a strong foundation for understanding and executing commands.
  • LIMA-style SFT: Further refined with 280,000 LIMA-style instruction-following examples, which typically focus on high-quality, diverse instructions to improve generalization.
  • Preference Alignment (DPO): Utilizes Direct Preference Optimization (DPO) on 90,000 user preference sets, aligning the model's outputs more closely with human preferences and desired behaviors.
  • Text-in, Text-out: Designed for standard text-based input and output, making it versatile for various natural language processing tasks.

Good For

  • Applications requiring a model with strong instruction-following abilities.
  • Tasks where alignment with human preferences is crucial.
  • Developing chatbots, virtual assistants, or systems that need to generate precise and contextually relevant responses based on explicit instructions.

Popular Sampler Settings

Top 3 parameter combinations used by Featherless users for this model. Click a tab to see each config.

temperature
top_p
top_k
frequency_penalty
presence_penalty
repetition_penalty
min_p