IDEA-CCNL/Ziya-Coding-34B-v1.0

TEXT GENERATIONConcurrency Cost:2Model Size:34BQuant:FP8Ctx Length:32kPublished:Sep 27, 2023License:gpl-3.0Architecture:Transformer0.0K Open Weights Cold

Ziya-Coding-34B-v1.0 is a 34 billion parameter code generation model developed by IDEA-CCNL, specifically designed to generate high-quality code from natural language instructions. It achieves a HumanEval Pass@1 score of 75.5%, surpassing GPT-4's 67.0% and setting a new high for known open-source models in this benchmark. The model was fine-tuned using a two-stage process involving 450,000 instruction data points and an evol-instruct method with compiler feedback and LLM-generated unit tests. It is primarily optimized for code generation tasks, offering strong performance in converting natural language prompts into functional code.

Loading preview...

Ziya-Coding-34B-v1.0: High-Performance Code Generation

Ziya-Coding-34B-v1.0, developed by IDEA-CCNL, is a 34 billion parameter model specialized in generating high-quality code from natural language. It has demonstrated exceptional performance in code generation benchmarks.

Key Capabilities & Performance

  • Superior Code Generation: Achieved a HumanEval Pass@1 score of 75.5%, outperforming GPT-4 (67.0%) and other leading open-source models like CodeFuse-CodeLlama-34B (74.4%) and Phind-CodeLLaMa-34B-v2 (73.8%).
  • Advanced Fine-tuning: The model underwent a two-stage fine-tuning process. The first stage utilized approximately 450,000 instruction data points (100k Chinese, 350k English), generated by using LLMs to create instructions from high-quality non-instructional code. The second stage employed an evol-instruct method to generate complex code instructions, using a code compiler for feedback and LLM-generated unit tests to ensure correctness.
  • Context Length: Supports a context length of 32768 tokens, enabling the processing of substantial codebases or complex prompts.

Good For

  • Code Generation: Ideal for developers and researchers requiring high-accuracy code generation from natural language descriptions.
  • Benchmarking: Suitable for evaluating and comparing code generation capabilities against state-of-the-art models.
  • Customization: Provides a strong base for further fine-tuning on specific programming languages or domain-specific code generation tasks.