Name: adityawakharkar/AstraGPTCoder-7B API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: adityawakharkar

Overview

AstraGPT-7B is a 7-billion parameter decoder-only language model developed by Aditya Wakharkar of Tantra AI Labs. Unlike most open-source models, AstraGPT-7B was built entirely from scratch in PyTorch, encompassing its custom transformer architecture, Byte Pair Encoding (BPE) tokenizer, and supervised fine-tuning (SFT) pipeline. It was fine-tuned on a reasoning dataset using dual NVIDIA RTX 4090 GPUs.

Key Differentiators

Custom Architecture: Features a unique transformer design with Grouped Query Attention (GQA), Rotary Position Embeddings (RoPE) with a high θ=1,000,000, SwiGLU FFN, and RMSNorm, all implemented from first principles.
Custom BPE Tokenizer: A bespoke tokenizer with a 64,000-token vocabulary, byte-level base, GPT-4 style pre-tokenization regex, and built-in special tokens like <think> and <|im_start|>. This allows for precise control over tokenization.
From-Scratch Training Pipeline: The SFT training loop was also custom-built, incorporating features like gradient accumulation, BF16 mixed precision, cosine LR scheduling, and gradient clipping, optimized for dual RTX 4090 hardware.

Primary Capabilities

Coding: Designed for code generation and understanding.
Chain-of-Thought Reasoning: Natively supports <think>...</think> style reasoning output, triggered by specific prompt formatting, enabling more structured problem-solving.

Use Cases

Code Generation: Ideal for developers needing a model optimized for programming tasks.
Reasoning Tasks: Suitable for applications requiring explicit, step-by-step reasoning, particularly when leveraging the <think> tag functionality.

Limitations

May produce hallucinations; verification of outputs is recommended.
Complex multi-step math can be challenging for this 7B model.
Primarily optimized for English language performance.
The <think> tag reasoning is most reliable when explicitly prompted with <think>\n.

Overview

Overview

Key Differentiators

Primary Capabilities

Use Cases

Limitations

Full Model Card (README)