Name: arbenilazi/dpo-mbpp-merged API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: arbenilazi

Overview

arbenilazi/dpo-mbpp-merged is a 7.6 billion parameter model based on Qwen2.5-Coder-7B-Instruct, specifically fine-tuned for code generation tasks. It utilizes the Direct Preference Optimization (DPO) method, trained on the google-research-datasets/mbpp dataset.

Key Characteristics

Base Model: Qwen/Qwen2.5-Coder-7B-Instruct, known for its coding capabilities.
Training Method: DPO using Hugging Face TRL's DPOTrainer, with a separate frozen reference model.
Efficiency: Trained with 4-bit QLoRA (nf4 quantization with bf16 compute) for efficient fine-tuning.
Deployment Ready: The LoRA adapters were merged into the base model using PEFT, resulting in a fully merged safetensors checkpoint in bf16 precision. This means no LoRA or quantization is required at inference time, simplifying deployment.
Context Length: Supports a context length of 32,768 tokens.

Training Details

The model underwent 2 epochs of training with a batch size of 4 and a learning rate of 2e-5. LoRA configuration included target modules q_proj, k_proj, v_proj, o_proj with a rank (r) of 16 and alpha of 32.

Good For

Developers and researchers focused on improving code generation performance.
Applications requiring a specialized model for programming-related tasks.
Scenarios where a fully merged, ready-to-use bf16 model is preferred for ease of inference.

Overview

Overview

Key Characteristics

Training Details

Good For

Full Model Card (README)