NovoCode/Phi-2-DPO

TEXT GENERATIONConcurrency Cost:1Model Size:3BQuant:BF16Ctx Length:2kPublished:Jan 28, 2024License:mitArchitecture:Transformer0.0K Open Weights Cold

NovoCode/Phi-2-DPO is a 3 billion parameter causal language model, fine-tuned from Microsoft's Phi-2 architecture. It was trained using the Intel/orca_dpo_pairs dataset, focusing on instruction following and preference alignment. This model is optimized for generating responses based on explicit instructions, making it suitable for conversational AI and task-oriented applications. It features a 2048 token context length, balancing performance with efficient resource usage.

Loading preview...

NovoCode/Phi-2-DPO: Instruction-Tuned Language Model

NovoCode/Phi-2-DPO is a 3 billion parameter language model derived from the Microsoft Phi-2 architecture. This model has undergone fine-tuning using the Intel/orca_dpo_pairs dataset, which is designed for training models with Direct Preference Optimization (DPO) to enhance instruction following and response quality.

Key Capabilities

  • Instruction Following: Optimized to generate responses that align with explicit user instructions, leveraging the DPO training methodology.
  • Compact Size: At 3 billion parameters, it offers a balance between performance and computational efficiency, suitable for various deployment scenarios.
  • Context Length: Supports a sequence length of 2048 tokens, allowing for processing moderately sized inputs and maintaining conversational context.

Training Details

The model was trained with specific hyperparameters including a learning rate of 3e-06, a micro batch size of 2, and 2 epochs. The training utilized an Adam optimizer with cosine learning rate scheduling and a warmup of 100 steps. The final validation loss achieved was approximately 1.2999.

Good For

  • Conversational AI: Developing chatbots or virtual assistants that require precise instruction adherence.
  • Task-Oriented Applications: Scenarios where the model needs to perform specific tasks based on user prompts.
  • Research and Development: Experimenting with DPO-tuned models on a smaller, efficient architecture.