Name: CEIA-RL/qwen3-4b-dw-lr-hf-dpo API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: CEIA-RL

Model Overview

CEIA-RL/qwen3-4b-dw-lr-hf-dpo is a specialized language model developed by CEIA-RL, fine-tuned from the cemig-temp/qwen3-4b-dw-lr base model. Its primary differentiation lies in its safety alignment for the Portuguese language (pt-BR).

Key Capabilities

Safety Alignment: The model has undergone specific training to enhance its safety characteristics, particularly for Portuguese content.
Portuguese Language Focus: Optimized for generating responses in Brazilian Portuguese, leveraging the CEIA-RL/Nemotron-SFT-Safety-pt-BR-Cleaned dataset.
Online DPO Training: Utilizes the Online DPO (Direct Language Model Alignment from Online AI Feedback) method, a technique for aligning language models with human preferences.

Training Details

This model was trained using the TRL library, a framework for Transformer Reinforcement Learning. The training specifically employed the Online DPO method, as detailed in the paper "Direct Language Model Alignment from Online AI Feedback" (arXiv:2402.04792).

Use Cases

This model is particularly well-suited for applications requiring safe and aligned text generation in Portuguese, such as:

Chatbots and conversational AI systems targeting Portuguese-speaking users.
Content moderation tools for Portuguese text.
Generating responses where safety and alignment are critical considerations.

Overview

Model Overview

Key Capabilities

Training Details

Use Cases

Full Model Card (README)