Phind/Phind-CodeLlama-34B-Python-v1

TEXT GENERATIONConcurrency Cost:2Model Size:34BQuant:FP8Ctx Length:32kPublished:Aug 25, 2023License:llama2Architecture:Transformer0.3K Open Weights Cold

Phind/Phind-CodeLlama-34B-Python-v1 is a 34 billion parameter CodeLlama-based model fine-tuned by Phind, specifically optimized for Python programming tasks. It achieves a 69.5% pass@1 on the HumanEval benchmark, surpassing GPT-4's 67% on the same metric. This model excels at generating high-quality Python code and solving programming problems, making it suitable for developers seeking robust code generation capabilities.

Loading preview...

Phind-CodeLlama-34B-Python-v1 Overview

Phind-CodeLlama-34B-Python-v1 is a 34 billion parameter language model developed by Phind, fine-tuned from CodeLlama-34B-Python. It is specifically designed for code generation and problem-solving in Python.

Key Capabilities & Performance

  • Exceptional Code Generation: Achieves a notable 69.5% pass@1 on the HumanEval benchmark, outperforming GPT-4's 67% in this specific evaluation.
  • Instruction-Tuned: Fine-tuned on a proprietary dataset of approximately 80,000 high-quality programming problems and solutions, structured as instruction-answer pairs.
  • Optimized Training: Trained for two epochs on the custom dataset using DeepSpeed ZeRO 3 and Flash Attention 2, utilizing 32 A100-80GB GPUs for rapid training.
  • Long Context Window: Supports a sequence length of 4096 tokens, allowing for more extensive code contexts.

Usage Notes

  • Instruction-Tuned, Not Chat-Tuned: This model is instruction-tuned, meaning it responds best to direct task instructions followed by \n: rather than conversational chat prompts.
  • Decontamination Methodology: Phind applied OpenAI's decontamination methodology to its training dataset to ensure the validity and reliability of benchmark results.

Ideal Use Cases

  • Python Code Generation: Highly effective for generating Python code snippets, functions, and solutions to programming challenges.
  • Developer Assistance: Useful for developers needing to quickly generate or complete Python code based on specific instructions.
  • Benchmarking: Provides a strong baseline for evaluating code generation performance against established metrics like HumanEval.