Overview
ajibawa-2023/OpenHermes-2.5-Code-290k-13B is a 13 billion parameter language model, fine-tuned from the Llama-2 architecture. It distinguishes itself through extensive training on a specialized dataset, OpenHermes-2.5-Code-290k, which integrates the high-quality OpenHermes-2.5 dataset with a custom Code-290k-ShareGPT dataset. This combined dataset comprises approximately 1.29 million cleaned, synthetically generated instruction and chat samples, formatted in Vicuna/ShareGPT style.
Key Capabilities
- Enhanced Code Generation: Significantly improved capabilities in generating code across various programming languages, often accompanied by explanations.
- General Text Generation: Proficient in tasks such as blogging, story generation, and question-answering.
- Instruction Following: Designed to follow instructions effectively, leveraging its Vicuna/ShareGPT-formatted training.
Training Details
The model underwent a full fine-tuning process over two epochs, requiring 21 days of training on 4 x A100 80GB GPUs. The training utilized Fschat and DeepSpeed codebase. Evaluation results on the Open LLM Leaderboard show an average score of 63.33, with specific scores including 57.34 on AI2 Reasoning Challenge and 58.30 on GSM8k, indicating strong performance across various benchmarks.
Good For
- Developers requiring a robust model for code generation and explanation.
- Applications needing a versatile language model for creative writing, content generation, and conversational AI.
- Users looking for a Llama-2 based model with a strong emphasis on coding capabilities.