CodeMate-v0.1: An Intelligent Programming Assistant

CodeMate-v0.1, developed by CodeMate, is a 34 billion parameter language model specifically designed to function as an intelligent programming assistant. This model aims to generate high-quality code solutions for various programming problems.

Key Capabilities & Training

Specialized Training Data: The model was exclusively fine-tuned on a proprietary dataset of 1.8 billion tokens, consisting of high-quality programming problems and their solutions. This dataset was manually generated and is internal to CodeMate.
Training Efficiency: Fine-tuning utilized Flash Attention 2, conducted over 15 hours on 40 A100-80GB GPUs, with a sequence length of 8096 tokens during training.
Multilingual Code Proficiency: CodeMate-v0.1 demonstrates proficiency across multiple programming languages, including Python, C/C++, TypeScript, Java, and others.
Prompt Format: It accepts prompts formatted in the Alpaca/Vicuna instruction style.

Performance & Limitations

Evaluations on the Open LLM Leaderboard show an average score of 58.39, with specific results including 55.55 on AI2 Reasoning Challenge and 78.03 on HellaSwag. It scored 40.18 on GSM8k. The model is currently in version 0.1 and has undergone limited testing; CodeMate recommends additional safety testing prior to real-world deployments.

Overview

CodeMate-v0.1: An Intelligent Programming Assistant

Key Capabilities & Training

Performance & Limitations

Full Model Card (README)