qgallouedec/online-dpo-qwen2-2

Hugging Face
TEXT GENERATIONConcurrency Cost:1Model Size:0.5BQuant:BF16Ctx Length:32kArchitecture:Transformer Warm

The qgallouedec/online-dpo-qwen2-2 is a 0.5 billion parameter causal language model, fine-tuned from Qwen/Qwen2-0.5B-Instruct. This model was specifically optimized using the trl-lib/ultrafeedback-prompt dataset, enhancing its instruction-following capabilities. With a substantial context length of 131072 tokens, it is designed for tasks requiring extensive contextual understanding and precise responses based on user prompts.

Loading preview...

Model Overview

The qgallouedec/online-dpo-qwen2-2 is a 0.5 billion parameter language model derived from the Qwen2-0.5B-Instruct architecture. It has been further refined through fine-tuning on the trl-lib/ultrafeedback-prompt dataset.

Key Capabilities

  • Instruction Following: Enhanced ability to understand and execute instructions due to fine-tuning on a comprehensive feedback dataset.
  • Context Handling: Features a notable context length of 131072 tokens, allowing it to process and generate responses based on very long inputs.
  • Qwen2 Base: Inherits the foundational capabilities and architectural strengths of the Qwen2 family of models.

Good For

  • Applications requiring a compact yet capable model for instruction-based tasks.
  • Scenarios where processing extensive input contexts is crucial.
  • Developing conversational agents or systems that benefit from improved instruction adherence.