Name: JetBrains/CodeLlama-7B-KStack API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: JetBrains

Overview

JetBrains/CodeLlama-7B-KStack is a 7 billion parameter language model developed by JetBrains, fine-tuned from the CodeLlama-7B base model. Its primary distinction lies in its specialized training on the KStack dataset, which comprises the largest collection of permissively licensed Kotlin code. This targeted fine-tuning aims to enhance the model's proficiency in generating and understanding Kotlin code.

Key Capabilities

Kotlin Code Generation: Optimized for generating Kotlin code, leveraging its extensive training on the KStack dataset.
Fill-in-the-Middle (FIM): Supports FIM capabilities, allowing for code completion within existing structures using a specific token format (<PRE> prefix <SUF> suffix <MID>).
Improved Kotlin Performance: Achieves a Kotlin HumanEval Pass Rate of 29.19%, outperforming the base CodeLlama-7B model's 26.09% on the Kotlin HumanEval dataset.

Training and Data Filtering

The model was fine-tuned on a single A100 GPU. The KStack dataset underwent rigorous rule-based filtering to ensure high quality, removing low-popularity repositories, those with few Kotlin files, and files with less than 20 lines of code. Content cleaning steps included removing non-ASCII entries, package lines, and half of the import lines to reduce noise and potential hallucinations.

Good For

Kotlin Developers: Ideal for developers working with Kotlin who need assistance with code generation, completion, or understanding.
Code Assistants: Suitable for integration into IDEs or other tools requiring strong Kotlin code intelligence.
Research on Code LLMs: Provides a specialized model for studying the impact of domain-specific fine-tuning on code generation performance, particularly for Kotlin.

Overview

Overview

Key Capabilities

Training and Data Filtering

Good For

Full Model Card (README)