CEIA-RL/qwen3-4b-dw-lr-hf-dpo
TEXT GENERATIONConcurrency Cost:1Model Size:4BQuant:BF16Ctx Length:32kPublished:Apr 2, 2026Architecture:Transformer Cold

CEIA-RL/qwen3-4b-dw-lr-hf-dpo is a 4 billion parameter language model fine-tuned from cemig-temp/qwen3-4b-dw-lr. Developed by CEIA-RL, this model utilizes Online DPO (Direct Language Model Alignment from Online AI Feedback) for its training procedure. It is designed to generate human-like text responses, demonstrating capabilities in conversational AI and general text generation tasks. The model has a context length of 32768 tokens, making it suitable for processing moderately long inputs.

Loading preview...