HYZ-01-0.6B-Base: A Turkish-Focused Base Language Model

HYZ-01-0.6B-Base is a 0.6 billion parameter base model from NeuroTürk, designed for Turkish language processing. It has undergone multi-stage continual pre-training (CPT) on a multilingual foundation, with a strong emphasis on Turkish data. This model is provided in its raw, pre-trained state, without any instruction tuning or alignment, making it ideal for custom fine-tuning.

Key Features & Technical Specifications

Turkish Optimization: Built on a multilingual foundation, the model received extensive CPT on various Turkish corpora, including web data, curated domain data, and Wikipedia.
Extended Tokenizer: The tokenizer has been specifically extended with 20 new tokens to better represent Turkish morphological features and support advanced structural use cases like chain-of-thought, code blocks, and dialogue management.
Model Architecture: Features 595.8 million total parameters, 28 layers, 1024 hidden dimensions, and Grouped-Query Attention (GQA) with 16 attention heads (Q) and 8 (KV) heads. It uses RoPE positional encoding with a theoretical maximum context of 32,768 tokens.
Training Details: Trained with bfloat16 precision using AdamW optimizer and flash-attention-2, with a training context length of 4,096 tokens.

Intended Use Cases

Fine-tuning: This base model is primarily intended for researchers and developers to fine-tune for specific Turkish NLP tasks, such as text generation, classification, or question answering.
Research & Development: Provides a strong foundation for exploring Turkish language models and developing specialized applications.

Limitations

As a base model, HYZ-01-0.6B-Base is not instruction-tuned and will not reliably follow instructions. Its performance in languages other than Turkish is significantly reduced, and its 0.6B parameter count may limit complex multi-step reasoning. Human verification of outputs is recommended for critical applications.

Overview

HYZ-01-0.6B-Base: A Turkish-Focused Base Language Model

Key Features & Technical Specifications

Intended Use Cases

Limitations

Full Model Card (README)