Magicoder-S-CL-7B: Code Generation with OSS-Instruct
Magicoder-S-CL-7B is a 7 billion parameter language model developed by Yuxiang Wei, Zhe Wang, Jiawei Liu, Yifeng Ding, and Lingming Zhang, specifically designed for coding tasks. It is fine-tuned from CodeLlama-7b-Python-hf and utilizes the innovative OSS-Instruct approach. This method enhances LLMs by using open-source code snippets to create diverse, realistic, and high-quality instruction data, mitigating the inherent bias often found in LLM-synthesized data.
Key Capabilities
- Code Generation: Excels at generating code based on user instructions.
- Low-Bias Data Generation: Employs OSS-Instruct to produce instruction data with reduced bias and increased quality.
- Reliable Responses: Aims to deliver accurate and dependable outputs for programming queries.
Training Details
The model was trained using two primary datasets:
- Magicoder-OSS-Instruct-75K: Generated via OSS-Instruct using
gpt-3.5-turbo-1106. - Magicoder-Evol-Instruct-110K: A decontaminated version of
evol-codealpaca-v1, used for further fine-tuning.
Good For
- Coding Tasks: Best suited for various programming-related instructions and code generation.
Limitations
- May not perform well on non-coding tasks.
- Can sometimes produce errors or misleading content, requiring user awareness of potential risks and biases.