Name: Andy-ML-And-AI/HyperThinkCode-Qwen3-8B-v1.5 API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: Andy-ML-And-AI

Model Overview

HyperThinkCode-Qwen3-8B-v1.5 is an 8 billion parameter model developed by Andy-ML-And-AI, built upon the Qwen3-8B base architecture. It is a LoRA fine-tune, specifically configured with 4-bit QLoRA using a rank of 16 and alpha of 16, applied across all linear layers including attention (q, k, v, o) and MLP (gate, up, down).

Key Capabilities & Training

This model was trained on a 30k subset of the Sashvat/HyperThink-X-Nvidia-Opencode-Reasoning-200K dataset. A core objective of its training was to encourage a 'thinking over direct response' behavior, utilizing a chat template where the assistant's response is placed within a 'thinking' field. The training process involved approximately 1 hour and 17 minutes over 50 steps, with a sequence length limited to 4096 tokens to manage code complexity and VRAM constraints. Initial training logs show a decreasing loss, indicating effective learning.

Use Cases & Evaluation

While evaluation is ongoing using the lm-eval library for benchmarks like HumanEval (coding) and GSM8K (math), the model's specialized training suggests its suitability for tasks requiring structured reasoning and problem-solving, particularly in programming and logical deduction. Its design to prioritize a 'thinking' process makes it potentially valuable for applications where step-by-step reasoning is more critical than immediate answers.

Overview

Model Overview

Key Capabilities & Training

Use Cases & Evaluation

Full Model Card (README)