Model Overview
The uukuguy/speechless-code-mistral-orca-7b-v1.0 is a 7 billion parameter language model built upon the Open-Orca/Mistral-7B-OpenOrca base. Its primary focus is to significantly improve reasoning, planning, and coding capabilities through a targeted fine-tuning process.
Key Capabilities & Training
This model was fine-tuned on a comprehensive dataset of over 200,000 samples, meticulously curated from various sources to bolster its core strengths:
- Coding and Reasoning: Includes filtered categories from
jondurbin/airoboros-2.2 and WizardLM/WizardLM_evol_instruct_V2_196k focusing on coding conversations and reasoning. - Instruction Following: Incorporates the 'cot' category from Open-Orca's 1M GPT4 dataset and the entirety of
garage-bAInd/Open-Platypus. - Python Code Generation: Enhanced with
TokenBender/python_eval_instruct_51k samples where Python is present in the output, and the Spider dataset for database interaction tasks.
Performance Highlights
- HumanEval: Achieves a score of 47.561 on the HumanEval Python benchmark, demonstrating strong code generation capabilities.
- Open LLM Leaderboard: Shows competitive performance with an average score of 62.92 across various benchmarks, including ARC (59.64), HellaSwag (82.25), and MMLU (61.33).
Good For
- Code Generation: Excels in generating Python code and handling coding-related conversations.
- Reasoning Tasks: Improved ability to tackle complex problems requiring logical deduction and planning.
- Instruction Following: Designed to follow intricate instructions effectively, particularly in technical domains.