ajibawa-2023/Code-290k-13B: Code Generation with Explanations
This 13 billion parameter model, fine-tuned from Llama-2, specializes in generating code across multiple programming languages alongside detailed explanations. Developed by ajibawa-2023, it addresses the common challenge of LLMs making mistakes in code generation by emphasizing clarity and understanding.
Key Capabilities
- Multi-language Code Generation: Supports Python, Java, JavaScript, Go, C++, Rust, Ruby, SQL, MySQL, R, Julia, Haskell, and more.
- Detailed Explanations: Provides comprehensive explanations accompanying generated code, enhancing developer understanding.
- Extensive Training Data: Trained on approximately 290,000 code examples, each featuring two conversations in Vicuna/ShareGPT format, ensuring robust performance.
- Base Model: Built upon the Llama-2 architecture by Meta.
Training Details
The model was trained for 165 hours over 3 epochs on 4 x A100 80GB GPUs, utilizing the DeepSpeed codebase. The training dataset, Code-290k-ShareGPT, combines and expands upon previous datasets like Python-Code-23k-ShareGPT and Code-74k-ShareGPT.
Performance
On the Open LLM Leaderboard, the model achieves an average score of 52.96, with notable scores including 81.55 on HellaSwag (10-Shot) and 72.69 on Winogrande (5-shot).
Usage
Users can interact with the model using a prompt format similar to Vicuna/ShareGPT v1.1, designed for conversational code generation with explanations. Quantized versions (GPTQ, GGUF, AWQ, Exllama v2) are also available for optimized deployment.