bigcode/gpt_bigcode-santacoder
The bigcode/gpt_bigcode-santacoder is a 1.1 billion parameter GPT-2 architecture model developed by BigCode, specifically designed for code generation. It features multi-query attention and a Fill-in-the-Middle objective, trained on 236 billion tokens of GitHub code. Optimized for Python, Java, and JavaScript, this model excels at completing code snippets based on provided context.
Loading preview...
Model Overview
The bigcode/gpt_bigcode-santacoder is a 1.1 billion parameter model from BigCode, built on a GPT-2 architecture with multi-query attention and a Fill-in-the-Middle objective. It is specifically engineered for code generation tasks, supporting Python, Java, and JavaScript. This version is compatible with transformers library versions 4.28.1 and newer, utilizing the GPTBigCode architecture.
Key Capabilities
- Code Completion: Excels at generating code snippets based on comments or function signatures.
- Multi-language Support: Primarily trained on Python, Java, and JavaScript.
- Attribution Tool: Provides a search index to help identify potential verbatim code generations for proper attribution.
Training Details
The model was pretrained over 600K steps on 236 billion tokens of GitHub code, using 96 Tesla V100 GPUs. The training dataset was filtered to include only permissively licensed code.
Intended Use
This model is not an instruction-tuned model. Users should phrase prompts as they would appear in source code, such as comments or function signatures, rather than natural language commands. While capable of generating code, it's important to note that the output is not guaranteed to be bug-free or efficient.