adityawakharkar/AstraGPTCoder-7B

TEXT GENERATIONConcurrency Cost:1Model Size:7.6BQuant:FP8Ctx Length:32kPublished:Apr 20, 2026License:apache-2.0Architecture:Transformer0.0K Open Weights Cold

AstraGPT-7B is a 7-billion parameter decoder-only language model developed by Aditya Wakharkar (Tantra AI Labs). It features a custom transformer architecture and BPE tokenizer, fine-tuned on a reasoning dataset using dual RTX 4090 GPUs. This model is specifically designed for coding and chain-of-thought reasoning, natively supporting `...` style output.

Loading preview...

Overview

AstraGPT-7B is a 7-billion parameter decoder-only language model developed by Aditya Wakharkar of Tantra AI Labs. Unlike most open-source models, AstraGPT-7B was built entirely from scratch in PyTorch, encompassing its custom transformer architecture, Byte Pair Encoding (BPE) tokenizer, and supervised fine-tuning (SFT) pipeline. It was fine-tuned on a reasoning dataset using dual NVIDIA RTX 4090 GPUs.

Key Differentiators

  • Custom Architecture: Features a unique transformer design with Grouped Query Attention (GQA), Rotary Position Embeddings (RoPE) with a high ฮธ=1,000,000, SwiGLU FFN, and RMSNorm, all implemented from first principles.
  • Custom BPE Tokenizer: A bespoke tokenizer with a 64,000-token vocabulary, byte-level base, GPT-4 style pre-tokenization regex, and built-in special tokens like <think> and <|im_start|>. This allows for precise control over tokenization.
  • From-Scratch Training Pipeline: The SFT training loop was also custom-built, incorporating features like gradient accumulation, BF16 mixed precision, cosine LR scheduling, and gradient clipping, optimized for dual RTX 4090 hardware.

Primary Capabilities

  • Coding: Designed for code generation and understanding.
  • Chain-of-Thought Reasoning: Natively supports <think>...</think> style reasoning output, triggered by specific prompt formatting, enabling more structured problem-solving.

Use Cases

  • Code Generation: Ideal for developers needing a model optimized for programming tasks.
  • Reasoning Tasks: Suitable for applications requiring explicit, step-by-step reasoning, particularly when leveraging the <think> tag functionality.

Limitations

  • May produce hallucinations; verification of outputs is recommended.
  • Complex multi-step math can be challenging for this 7B model.
  • Primarily optimized for English language performance.
  • The <think> tag reasoning is most reliable when explicitly prompted with <think>\n.