md-nishat-008/TigerCoder-1B

Hugging Face
TEXT GENERATIONConcurrency Cost:1Model Size:1BQuant:BF16Ctx Length:32kPublished:Feb 27, 2026License:cc-by-4.0Architecture:Transformer Open Weights Warm

TigerCoder-1B by md-nishat-008 is a 1 billion parameter instruction-tuned causal language model specifically designed for code generation in Bangla, supporting a 32768 token context length. It is the first dedicated family of Code LLMs for Bangla, fine-tuned on 300K Bangla instruction-code pairs. This model achieves significant Pass@1 gains (11-18%) over prior baselines and outperforms models up to 27x larger on Bangla code generation benchmarks, excelling in Python, C++, Java, and JavaScript.

Loading preview...

TigerCoder-1B: Bangla Code Generation LLM

TigerCoder-1B, developed by Nishat Raihan, Antonios Anastasopoulos, and Marcos Zampieri from George Mason University, is a 1 billion parameter instruction-tuned causal language model. It is part of the first dedicated family of Code LLMs for Bangla, a language severely underrepresented in code generation despite its large native speaker base. This model addresses the performance drop observed in general LLMs when processing Bangla coding prompts.

Key Capabilities & Features

  • Bangla Code Generation: Specialized for generating code from Bangla instructions across multiple programming languages.
  • Superior Performance: Despite its compact 1B parameter size, TigerCoder-1B surpasses models up to 27x larger (including Gemma-3 27B) by 4-8 percentage points on Bangla code generation benchmarks.
  • Multilingual Code Support: Achieves strong Pass@1 scores on Bangla prompts for Python (0.69), C++ (0.64), Java (0.58), and JavaScript (0.53) on mHumanEval.
  • Dedicated Training Data: Fine-tuned on 300K Bangla instruction-code pairs from the custom-created Bangla-Code-Instruct dataset, comprising Self-Instruct, Synthetic, and Translated+Filtered subsets.
  • MBPP-Bangla Benchmark: Introduced a 974-problem benchmark with expert-validated Bangla programming tasks across 5 languages.

Why TigerCoder-1B is Different

TigerCoder-1B demonstrates that high-quality, targeted, domain-specific data can outweigh model scale for low-resource code generation. It highlights that direct machine translation of coding prompts from Bangla to English does not improve performance due to mistranslation of code-specific keywords. This model is optimized primarily for Bangla code generation tasks, and its performance on general NLU or non-code tasks may not match general-purpose models.