bigcode/gpt_bigcode-santacoder

TEXT GENERATIONConcurrency Cost:1Model Size:1.1BQuant:BF16Ctx Length:32kPublished:Apr 6, 2023License:openrailArchitecture:Transformer0.0K Open Weights Cold

The bigcode/gpt_bigcode-santacoder is a 1.1 billion parameter GPT-2 architecture model developed by BigCode, specifically designed for code generation. It features multi-query attention and a Fill-in-the-Middle objective, trained on 236 billion tokens of GitHub code. Optimized for Python, Java, and JavaScript, this model excels at completing code snippets based on provided context.

Loading preview...

Model Overview

The bigcode/gpt_bigcode-santacoder is a 1.1 billion parameter model from BigCode, built on a GPT-2 architecture with multi-query attention and a Fill-in-the-Middle objective. It is specifically engineered for code generation tasks, supporting Python, Java, and JavaScript. This version is compatible with transformers library versions 4.28.1 and newer, utilizing the GPTBigCode architecture.

Key Capabilities

  • Code Completion: Excels at generating code snippets based on comments or function signatures.
  • Multi-language Support: Primarily trained on Python, Java, and JavaScript.
  • Attribution Tool: Provides a search index to help identify potential verbatim code generations for proper attribution.

Training Details

The model was pretrained over 600K steps on 236 billion tokens of GitHub code, using 96 Tesla V100 GPUs. The training dataset was filtered to include only permissively licensed code.

Intended Use

This model is not an instruction-tuned model. Users should phrase prompts as they would appear in source code, such as comments or function signatures, rather than natural language commands. While capable of generating code, it's important to note that the output is not guaranteed to be bug-free or efficient.