CEIA-RL/qwen3-4b-dw-lr-hf-dpo

TEXT GENERATIONConcurrency Cost:1Model Size:4BQuant:BF16Ctx Length:32kPublished:Apr 2, 2026Architecture:Transformer Cold

CEIA-RL/qwen3-4b-dw-lr-hf-dpo is a fine-tuned Qwen3-4B model developed by CEIA-RL, based on cemig-temp/qwen3-4b-dw-lr. This model is specifically optimized for safety alignment in Portuguese (pt-BR) through Online DPO training on the CEIA-RL/Nemotron-SFT-Safety-pt-BR-Cleaned dataset. It is designed for generating safer and more aligned responses in Portuguese conversational AI applications.

Loading preview...

Model Overview

CEIA-RL/qwen3-4b-dw-lr-hf-dpo is a specialized language model developed by CEIA-RL, fine-tuned from the cemig-temp/qwen3-4b-dw-lr base model. Its primary differentiation lies in its safety alignment for the Portuguese language (pt-BR).

Key Capabilities

  • Safety Alignment: The model has undergone specific training to enhance its safety characteristics, particularly for Portuguese content.
  • Portuguese Language Focus: Optimized for generating responses in Brazilian Portuguese, leveraging the CEIA-RL/Nemotron-SFT-Safety-pt-BR-Cleaned dataset.
  • Online DPO Training: Utilizes the Online DPO (Direct Language Model Alignment from Online AI Feedback) method, a technique for aligning language models with human preferences.

Training Details

This model was trained using the TRL library, a framework for Transformer Reinforcement Learning. The training specifically employed the Online DPO method, as detailed in the paper "Direct Language Model Alignment from Online AI Feedback" (arXiv:2402.04792).

Use Cases

This model is particularly well-suited for applications requiring safe and aligned text generation in Portuguese, such as:

  • Chatbots and conversational AI systems targeting Portuguese-speaking users.
  • Content moderation tools for Portuguese text.
  • Generating responses where safety and alignment are critical considerations.