adityawakharkar/AstraGPTCoder-7B
AstraGPT-7B is a 7-billion parameter decoder-only language model developed by Aditya Wakharkar (Tantra AI Labs). It features a custom transformer architecture and BPE tokenizer, fine-tuned on a reasoning dataset using dual RTX 4090 GPUs. This model is specifically designed for coding and chain-of-thought reasoning, natively supporting `...` style output.
Loading preview...
Overview
AstraGPT-7B is a 7-billion parameter decoder-only language model developed by Aditya Wakharkar of Tantra AI Labs. Unlike most open-source models, AstraGPT-7B was built entirely from scratch in PyTorch, encompassing its custom transformer architecture, Byte Pair Encoding (BPE) tokenizer, and supervised fine-tuning (SFT) pipeline. It was fine-tuned on a reasoning dataset using dual NVIDIA RTX 4090 GPUs.
Key Differentiators
- Custom Architecture: Features a unique transformer design with Grouped Query Attention (GQA), Rotary Position Embeddings (RoPE) with a high
ฮธ=1,000,000, SwiGLU FFN, and RMSNorm, all implemented from first principles. - Custom BPE Tokenizer: A bespoke tokenizer with a 64,000-token vocabulary, byte-level base, GPT-4 style pre-tokenization regex, and built-in special tokens like
<think>and<|im_start|>. This allows for precise control over tokenization. - From-Scratch Training Pipeline: The SFT training loop was also custom-built, incorporating features like gradient accumulation, BF16 mixed precision, cosine LR scheduling, and gradient clipping, optimized for dual RTX 4090 hardware.
Primary Capabilities
- Coding: Designed for code generation and understanding.
- Chain-of-Thought Reasoning: Natively supports
<think>...</think>style reasoning output, triggered by specific prompt formatting, enabling more structured problem-solving.
Use Cases
- Code Generation: Ideal for developers needing a model optimized for programming tasks.
- Reasoning Tasks: Suitable for applications requiring explicit, step-by-step reasoning, particularly when leveraging the
<think>tag functionality.
Limitations
- May produce hallucinations; verification of outputs is recommended.
- Complex multi-step math can be challenging for this 7B model.
- Primarily optimized for English language performance.
- The
<think>tag reasoning is most reliable when explicitly prompted with<think>\n.