JetBrains/Mellum-4b-base

Warm
Public
4B
BF16
32768
License: apache-2.0
Hugging Face
Overview

Mellum-4b-base: JetBrains' Code-Optimized LLM

Mellum-4b-base is JetBrains' inaugural open-source large language model, featuring a LLaMA-style architecture with 4 billion parameters. It is meticulously optimized for code-related tasks, making it highly efficient for applications like intelligent code suggestions and AI-powered coding assistants.

Key Capabilities & Features

  • Code-Centric Training: Trained on over 4 trillion tokens, including extensive code data from sources like The Stack and StarCoder Training Dataset.
  • Context Window: Supports an 8192-token context window, crucial for understanding larger code segments.
  • Efficient Deployment: Designed for both cloud inference (e.g., vLLM) and local deployment (e.g., llama.cpp, Ollama) due to its parameter size.
  • Base Model Flexibility: Provided as a base model, it is fully capable of supporting supervised fine-tuning (SFT) and reinforcement learning (RL) for adaptation to specific applications.
  • Performance: Achieves competitive results on code benchmarks such as RepoBench 1.1 (Python Avg ≤ 8k: 27.97%, Java Avg ≤ 8k: 31.08%), Syntax-Aware Fill-in-the-Middle (SAFIM) (Average: 38.11%), and HumanEval Infilling (Single-Line: 66.21%).

Good For

  • IDE Integration: Ideal for integration into professional developer tooling for features like code completion and intelligent suggestions.
  • Coding Assistants: Powering AI-driven coding assistants that require strong code understanding and generation capabilities.
  • Research & Education: Suitable for research in code understanding and generation, as well as educational applications.
  • Fine-tuning: Serves as an excellent base model for further supervised fine-tuning (SFT) to adapt to specific programming languages or tasks.