Name: jukofyork/Kimi-K2-Instruct-DRAFT-0.6B-v3.0 API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: jukofyork

Model Overview

jukofyork/Kimi-K2-Instruct-DRAFT-0.6B-v3.0 is a 0.6 billion parameter draft model, built upon the Qwen2.5-0.5B-Instruct architecture. Its primary purpose is to serve as a speculative decoding model for the Kimi-K2-Instruct series. The model was created by transplanting the vocabulary from Qwen2.5-0.5B-Instruct to align with Kimi-K2-Instruct's tokenizer, followed by fine-tuning.

Key Capabilities

Speculative Decoding: Designed as a draft model to accelerate inference when paired with a larger Kimi-K2-Instruct model.
Extended Context Length: Supports a default context window of 32,768 tokens, which can be extended to 65,536 or 131,072 tokens by modifying the config.json with YaRN scaling parameters. This makes it suitable for processing very long documents or conversations.
Training Data: Fine-tuned on approximately 2.3 billion tokens from diverse datasets including agentlans/common-crawl-sample, bigcode/the-stack-smol-xl, and rombodawg/Everything_Instruct.

How it was created

Vocabulary Transplant: The initial model was created from Qwen2.5-0.5B-Instruct using transplant-vocab to align its tokenizer with Kimi-K2-Instruct, handling non-standard token overrides.
Fine-tuning: Trained for one epoch using qlora-pipe-lite with a batch size of 60 and a sequence length of 32,768 tokens, utilizing six RTX A6000 GPUs.
GGUF Conversion: Includes specific modifications to convert_hf_to_gguf.py to address TikToken / SentencePiece tokenizer mismatches for llama.cpp compatibility, enabling GGUF quantization.

Overview

Model Overview

Key Capabilities

How it was created

Full Model Card (README)