OpenChat 3.5 1210: A Specialized 7B Language Model
OpenChat 3.5 1210 is a 7 billion parameter instruction-tuned model developed by OpenChat, designed to excel in coding and mathematical reasoning tasks. It features a 4096-token context window and introduces two distinct operational modes: a "GPT4 Correct" mode for general tasks and coding, and a "Math Correct" mode specifically tailored for mathematical problem-solving.
Key Capabilities
- Enhanced Coding Performance: Achieves a 15-point improvement in coding benchmarks over its predecessor, OpenChat-3.5, and surpasses ChatGPT (March) in HumanEval.
- Strong Mathematical Reasoning: Demonstrates superior performance in mathematical tasks (MATH, GSM8K) compared to models like Grok-0 and Grok-1.
- Evaluator Support: Includes experimental capabilities for acting as an evaluator, utilizing a prompt structure similar to Prometheus for assessing response quality.
- Optimized Deployment: Recommended for high-throughput deployment using vLLM, compatible with OpenAI ChatCompletion API specifications.
Benchmarks and Performance
OpenChat 3.5 1210 shows competitive performance across a range of benchmarks:
- HumanEval: Scores 68.9, outperforming ChatGPT (March) and Grok-1.
- MATH: Achieves 28.9, surpassing Grok-0 and Grok-1.
- GSM8K: Scores 77.3, matching OpenChat-3.5 and outperforming ChatGPT (March) and Grok-1.
- Overall: Achieves an average score of 63.8, positioning it as a strong contender among 7B models.
Limitations
Like its foundation models, OpenChat 3.5 1210 may exhibit limitations in complex reasoning, hallucination of non-existent information, and potential generation of harmful or biased content. Users should implement additional safety measures for sensitive applications.