yunjae-won/mpq3_llama8b_sft_dpo_beta1e-1_step2304

TEXT GENERATIONConcurrency Cost:1Model Size:8BQuant:FP8Ctx Length:8kPublished:Apr 6, 2026Architecture:Transformer Cold

The yunjae-won/mpq3_llama8b_sft_dpo_beta1e-1_step2304 is an 8 billion parameter language model based on the Llama architecture. This model has been fine-tuned using Supervised Fine-Tuning (SFT) and Direct Preference Optimization (DPO) techniques. With an 8192 token context length, it is designed for general language generation tasks, leveraging advanced alignment methods.

Loading preview...

Overview

This model, yunjae-won/mpq3_llama8b_sft_dpo_beta1e-1_step2304, is an 8 billion parameter language model built upon the Llama architecture. It has undergone a two-stage fine-tuning process, incorporating both Supervised Fine-Tuning (SFT) and Direct Preference Optimization (DPO) with a beta value of 1e-1, specifically at step 2304 of its training. The model supports a context length of 8192 tokens.

Key Characteristics

  • Architecture: Llama-based, 8 billion parameters.
  • Fine-tuning: Utilizes Supervised Fine-Tuning (SFT) for initial instruction following and Direct Preference Optimization (DPO) for alignment with human preferences.
  • Context Length: Supports an 8192-token context window, allowing for processing longer inputs and generating more coherent responses.

Potential Use Cases

Given its Llama base and advanced alignment techniques, this model is suitable for a variety of general-purpose natural language processing tasks. However, specific performance metrics and intended applications are not detailed in the provided model card. Users should conduct their own evaluations for specific use cases.