Overview
arbenilazi/dpo-mbpp-merged is a 7.6 billion parameter model based on Qwen2.5-Coder-7B-Instruct, specifically fine-tuned for code generation tasks. It utilizes the Direct Preference Optimization (DPO) method, trained on the google-research-datasets/mbpp dataset.
Key Characteristics
- Base Model: Qwen/Qwen2.5-Coder-7B-Instruct, known for its coding capabilities.
- Training Method: DPO using Hugging Face TRL's
DPOTrainer, with a separate frozen reference model. - Efficiency: Trained with 4-bit QLoRA (nf4 quantization with bf16 compute) for efficient fine-tuning.
- Deployment Ready: The LoRA adapters were merged into the base model using PEFT, resulting in a fully merged
safetensors checkpoint in bf16 precision. This means no LoRA or quantization is required at inference time, simplifying deployment. - Context Length: Supports a context length of 32,768 tokens.
Training Details
The model underwent 2 epochs of training with a batch size of 4 and a learning rate of 2e-5. LoRA configuration included target modules q_proj, k_proj, v_proj, o_proj with a rank (r) of 16 and alpha of 32.
Good For
- Developers and researchers focused on improving code generation performance.
- Applications requiring a specialized model for programming-related tasks.
- Scenarios where a fully merged, ready-to-use bf16 model is preferred for ease of inference.