CEIA-RL/qwen3-4b-dw-lr-dpo-offline-energy-GRPO
The CEIA-RL/qwen3-4b-dw-lr-dpo-offline-energy-GRPO is a 4 billion parameter language model based on the Qwen3 architecture, fine-tuned using Direct Preference Optimization (DPO) with offline energy-related data. This model is designed for applications requiring efficient and optimized responses, potentially in energy-aware or resource-constrained environments. Its DPO training suggests an emphasis on aligning outputs with specific human preferences or performance metrics. The model has a context length of 32768 tokens.
Loading preview...
Model Overview
This model, CEIA-RL/qwen3-4b-dw-lr-dpo-offline-energy-GRPO, is a 4 billion parameter language model built upon the Qwen3 architecture. It has been fine-tuned using Direct Preference Optimization (DPO) with a focus on offline energy-related data, as indicated by its name and the associated Energy-RAG-PENSAR project on Weights & Biases. The training process involved a step count of 320, and the judge model used for evaluation was gpt-oss 120b.
Key Characteristics
- Architecture: Qwen3-based, a 4 billion parameter model.
- Training Method: Utilizes Direct Preference Optimization (DPO) for fine-tuning.
- Data Focus: Trained with offline energy-related datasets, suggesting specialization in this domain.
- Context Length: Supports a substantial context window of 32768 tokens.
Potential Use Cases
Given its specialized training, this model is likely suitable for applications requiring:
- Energy-aware AI: Tasks related to energy consumption, optimization, or analysis.
- Resource-constrained environments: Its 4B parameter size makes it more efficient than larger models.
- Preference-aligned generation: Generating outputs that adhere to specific desired characteristics or preferences learned during DPO.