ajibawa-2023/Code-290k-13B

TEXT GENERATIONConcurrency Cost:1Model Size:13BQuant:FP8Ctx Length:4kPublished:Jan 16, 2024License:cc-by-nc-nd-4.0Architecture:Transformer0.0K Open Weights Cold

The ajibawa-2023/Code-290k-13B is a 13 billion parameter language model, fine-tuned from the Llama-2 base model, specifically designed for multi-language code generation with detailed explanations. Trained on a dataset of 290,000 code examples across Python, Java, JavaScript, Go, C++, Rust, Ruby, SQL, and more, it excels at providing both functional code and comprehensive accompanying explanations. This model is optimized for developers seeking not just code, but also a clear understanding of its logic and implementation.

Loading preview...

ajibawa-2023/Code-290k-13B: Code Generation with Explanations

This 13 billion parameter model, fine-tuned from Llama-2, specializes in generating code across multiple programming languages alongside detailed explanations. Developed by ajibawa-2023, it addresses the common challenge of LLMs making mistakes in code generation by emphasizing clarity and understanding.

Key Capabilities

  • Multi-language Code Generation: Supports Python, Java, JavaScript, Go, C++, Rust, Ruby, SQL, MySQL, R, Julia, Haskell, and more.
  • Detailed Explanations: Provides comprehensive explanations accompanying generated code, enhancing developer understanding.
  • Extensive Training Data: Trained on approximately 290,000 code examples, each featuring two conversations in Vicuna/ShareGPT format, ensuring robust performance.
  • Base Model: Built upon the Llama-2 architecture by Meta.

Training Details

The model was trained for 165 hours over 3 epochs on 4 x A100 80GB GPUs, utilizing the DeepSpeed codebase. The training dataset, Code-290k-ShareGPT, combines and expands upon previous datasets like Python-Code-23k-ShareGPT and Code-74k-ShareGPT.

Performance

On the Open LLM Leaderboard, the model achieves an average score of 52.96, with notable scores including 81.55 on HellaSwag (10-Shot) and 72.69 on Winogrande (5-shot).

Usage

Users can interact with the model using a prompt format similar to Vicuna/ShareGPT v1.1, designed for conversational code generation with explanations. Quantized versions (GPTQ, GGUF, AWQ, Exllama v2) are also available for optimized deployment.