CodeGemma-2B: Fast Code Completion Model
CodeGemma-2B is a 2.6 billion parameter, decoder-only model from Google, part of the CodeGemma family built upon the Gemma architecture. This specific variant is pretrained and highly optimized for rapid code completion tasks, distinguishing it from its larger counterparts that focus on general code generation or instruction following.
Key Capabilities
- Code Completion: Designed for fill-in-the-middle (FIM) scenarios, where it can complete code given a prefix and suffix context. This is ideal for IDE integrations.
- Fast Inference: As a 2.6 billion parameter model, it offers quicker response times compared to larger models, making it suitable for real-time coding assistance.
- Multi-file Context: Supports multi-file contexts using a
<|file_separator|> token, allowing for more relevant completions in complex projects.
Training and Performance
CodeGemma-2B was trained on an additional 500 billion tokens of primarily English code from public repositories, open-source mathematics datasets, and synthetically generated code. It utilizes advanced data processing techniques like dependency graph-based packing and unit test-based lexical packing to improve alignment with real-world applications. On coding benchmarks, it achieves 78.41 on HumanEval Single Line and 51.44 on HumanEval Multi Line infilling tasks.
Good For
- IDE Code Completion: Integrating into development environments for real-time code suggestions and infilling.
- Rapid Prototyping: Assisting developers with quick code snippets and completions to accelerate development.
- Educational Tools: Powering interactive coding platforms that require fast and accurate code completion.