5456es/selective_dpo_Llama-3.2-1B-Instruct_prune_0.7-sigmoid

TEXT GENERATIONConcurrency Cost:1Model Size:1BQuant:BF16Ctx Length:32kTool Calling:SupportedPublished:Sep 7, 2025License:apache-2.0Architecture:Transformer Open Weights Cold

The 5456es/selective_dpo_Llama-3.2-1B-Instruct_prune_0.7-sigmoid model is a 1 billion parameter instruction-tuned language model, derived from Llama-3.2-1B-Instruct. It has been fine-tuned using Direct Preference Optimization (DPO) with a selective method and pruning applied during training. This model is designed for generating responses based on learned preferences, offering a compact solution for preference-aligned text generation tasks.

Loading preview...

Model Overview

This model, selective_dpo_Llama-3.2-1B-Instruct_prune_0.7-sigmoid, is a 1 billion parameter language model developed by 5456es. It is built upon the Llama-3.2-1B-Instruct base model and has been specifically fine-tuned using a selective Direct Preference Optimization (DPO) method. This approach incorporates preference data during training, aiming to align the model's outputs with desired human preferences.

Key Characteristics

  • Base Model: Llama-3.2-1B-Instruct, providing a strong foundation for instruction-following.
  • Training Method: Utilizes a selective DPO approach, which is a technique for fine-tuning models based on explicit preference data.
  • Pruning: Pruning was applied during the training process, which can lead to a more efficient model, though the exact pruning ratio is not specified.
  • Context Length: Supports a context length of 32768 tokens, allowing for processing longer inputs and generating more extensive outputs.

Potential Use Cases

  • Preference-aligned text generation: Ideal for scenarios where outputs need to conform to specific stylistic or content preferences.
  • Instruction-following tasks: Benefits from its Llama-3.2-1B-Instruct base, making it suitable for various instruction-based prompts.
  • Resource-constrained environments: Its 1 billion parameter size makes it a more efficient option compared to larger models, especially given the pruning applied.

Limitations

Users should be aware that this model inherits the limitations of its base Llama-3.2-1B-Instruct model and may introduce additional limitations due to the pruning process.