Tekimax/granite-ml-coder

TEXT GENERATIONConcurrency Cost:1Model Size:1BQuant:BF16Ctx Length:32kPublished:Jun 5, 2026License:apache-2.0Architecture:Transformer Open Weights Cold

Tekimax/granite-ml-coder is a 1 billion parameter Python/machine-learning coding assistant, fine-tuned from ibm-granite/granite-3.1-1b-a400m-instruct with a 32768 token context length. Optimized for generating runnable scikit-learn, pandas, and NumPy code, it also explains ML pipeline steps and concepts like overfitting and gradient descent. Its compact size allows for fully local operation on a laptop CPU, making it ideal for private, offline data science work.

Loading preview...

Tekimax/granite-ml-coder: A Compact ML Coding Assistant

Tekimax/granite-ml-coder is a specialized 1 billion parameter language model designed as a Python and machine-learning coding assistant. Fine-tuned from ibm-granite/granite-3.1-1b-a400m-instruct, this model excels at generating practical, runnable code for popular ML libraries such as scikit-learn, pandas, and NumPy.

Key Capabilities

  • Code Generation: Drafts Python ML code for notebooks and IDEs, particularly strong with classic ML frameworks.
  • Conceptual Explanations: Provides clear explanations of machine learning pipeline steps and core concepts like overfitting, cross-validation, and gradient descent.
  • Local Operation: Its small footprint enables efficient execution fully locally on a laptop CPU, via Ollama, or as a quantized GGUF, ensuring data privacy.

Training Details

The model was fine-tuned using the iamtarun/python_code_instructions_18k_alpaca dataset, specifically filtered for ML/DS-related examples (approximately 2,341 entries). The training involved a full fine-tune over 2 epochs, with loss computed only on the assistant's answer.

Intended Use Cases

  • Private Copilot: Serves as an offline, private coding assistant for data science tasks where data security is paramount.
  • Rapid Prototyping: Quickly generates first drafts of ML code, accelerating development workflows.
  • Educational Tool: Helps in understanding and explaining complex ML concepts and pipeline components.

Limitations

As a 1 billion parameter model, it is not a frontier model and may produce incomplete or subtly incorrect code; outputs should be verified. It is primarily focused on English and Python, with its strongest performance on classic ML tasks (sklearn/pandas) rather than large, novel, or multi-file projects.