Overview
Overview
vilm/Quyen-Pro-v0.1 is a 14.2 billion parameter instruction-tuned large language model, part of the Quyen series developed by vilm. It is built upon the Qwen1.5 architecture and represents the 'Pro' variant within a family of six models ranging from 0.5B to 72B parameters.
Key Capabilities & Training
- Architecture: Based on the Qwen1.5 family, providing a robust foundation for language understanding and generation.
- Training Methodology: Utilizes Supervised Fine-Tuning (SFT) and Direct Preference Optimization (DPO) to align its responses with human preferences and instructions.
- Diverse Training Data: Fine-tuned on a comprehensive dataset including well-known public datasets like OpenHermes-2.5 by Teknium, Capybara by LDJ, argilla/distilabel-capybara-dpo-7k-binarized by argilla, and orca_dpo_pairs by Intel. This is augmented with private data from Ontocord and BEE-spoke-data.
- Context Length: Supports a context window of 32768 tokens, allowing for processing and generating longer sequences of text.
- Prompt Format: Employs the ChatML prompt template, ensuring consistent and structured interaction for conversational AI applications.
Usage Considerations
- General Purpose: Designed for a broad range of conversational and instructional tasks.
- Prompting: Users should adhere to the ChatML format for optimal performance, with examples provided for both direct string formatting and
apply_chat_template. - Benchmarks: Performance benchmarks are currently pending and will be updated by the developers.