Magicoder-S-CL-7B: Code Generation with OSS-Instruct

Magicoder-S-CL-7B is a 7 billion parameter language model developed by Yuxiang Wei, Zhe Wang, Jiawei Liu, Yifeng Ding, and Lingming Zhang, specifically designed for coding tasks. It is fine-tuned from CodeLlama-7b-Python-hf and utilizes the innovative OSS-Instruct approach. This method enhances LLMs by using open-source code snippets to create diverse, realistic, and high-quality instruction data, mitigating the inherent bias often found in LLM-synthesized data.

Key Capabilities

Code Generation: Excels at generating code based on user instructions.
Low-Bias Data Generation: Employs OSS-Instruct to produce instruction data with reduced bias and increased quality.
Reliable Responses: Aims to deliver accurate and dependable outputs for programming queries.

Training Details

The model was trained using two primary datasets:

Magicoder-OSS-Instruct-75K: Generated via OSS-Instruct using gpt-3.5-turbo-1106.
Magicoder-Evol-Instruct-110K: A decontaminated version of evol-codealpaca-v1, used for further fine-tuning.

Good For

Coding Tasks: Best suited for various programming-related instructions and code generation.

Limitations

May not perform well on non-coding tasks.
Can sometimes produce errors or misleading content, requiring user awareness of potential risks and biases.

Overview

Magicoder-S-CL-7B: Code Generation with OSS-Instruct

Key Capabilities

Training Details

Good For

Limitations

Full Model Card (README)