Phind/Phind-CodeLlama-34B-Python-v1
TEXT GENERATIONConcurrency Cost:2Model Size:34BQuant:FP8Ctx Length:32kPublished:Aug 25, 2023License:llama2Architecture:Transformer0.3K Open Weights Cold
Phind/Phind-CodeLlama-34B-Python-v1 is a 34 billion parameter CodeLlama-based model fine-tuned by Phind, specifically optimized for Python programming tasks. It achieves a 69.5% pass@1 on the HumanEval benchmark, surpassing GPT-4's 67% on the same metric. This model excels at generating high-quality Python code and solving programming problems, making it suitable for developers seeking robust code generation capabilities.
Loading preview...
Phind-CodeLlama-34B-Python-v1 Overview
Phind-CodeLlama-34B-Python-v1 is a 34 billion parameter language model developed by Phind, fine-tuned from CodeLlama-34B-Python. It is specifically designed for code generation and problem-solving in Python.
Key Capabilities & Performance
- Exceptional Code Generation: Achieves a notable 69.5% pass@1 on the HumanEval benchmark, outperforming GPT-4's 67% in this specific evaluation.
- Instruction-Tuned: Fine-tuned on a proprietary dataset of approximately 80,000 high-quality programming problems and solutions, structured as instruction-answer pairs.
- Optimized Training: Trained for two epochs on the custom dataset using DeepSpeed ZeRO 3 and Flash Attention 2, utilizing 32 A100-80GB GPUs for rapid training.
- Long Context Window: Supports a sequence length of 4096 tokens, allowing for more extensive code contexts.
Usage Notes
- Instruction-Tuned, Not Chat-Tuned: This model is instruction-tuned, meaning it responds best to direct task instructions followed by
\n:rather than conversational chat prompts. - Decontamination Methodology: Phind applied OpenAI's decontamination methodology to its training dataset to ensure the validity and reliability of benchmark results.
Ideal Use Cases
- Python Code Generation: Highly effective for generating Python code snippets, functions, and solutions to programming challenges.
- Developer Assistance: Useful for developers needing to quickly generate or complete Python code based on specific instructions.
- Benchmarking: Provides a strong baseline for evaluating code generation performance against established metrics like HumanEval.