neuroturk/HYZ-01-0.6B-Base
HYZ-01-0.6B-Base is a 0.6 billion parameter base language model developed by NeuroTürk, specifically pre-trained for Turkish. It features multi-stage continual pre-training on a multilingual foundation with an extended tokenizer optimized for Turkish morphological structure. This model is designed for researchers and developers to fine-tune for various Turkish natural language processing tasks.
Loading preview...
HYZ-01-0.6B-Base: A Turkish-Focused Base Language Model
HYZ-01-0.6B-Base is a 0.6 billion parameter base model from NeuroTürk, designed for Turkish language processing. It has undergone multi-stage continual pre-training (CPT) on a multilingual foundation, with a strong emphasis on Turkish data. This model is provided in its raw, pre-trained state, without any instruction tuning or alignment, making it ideal for custom fine-tuning.
Key Features & Technical Specifications
- Turkish Optimization: Built on a multilingual foundation, the model received extensive CPT on various Turkish corpora, including web data, curated domain data, and Wikipedia.
- Extended Tokenizer: The tokenizer has been specifically extended with 20 new tokens to better represent Turkish morphological features and support advanced structural use cases like chain-of-thought, code blocks, and dialogue management.
- Model Architecture: Features 595.8 million total parameters, 28 layers, 1024 hidden dimensions, and Grouped-Query Attention (GQA) with 16 attention heads (Q) and 8 (KV) heads. It uses RoPE positional encoding with a theoretical maximum context of 32,768 tokens.
- Training Details: Trained with bfloat16 precision using AdamW optimizer and flash-attention-2, with a training context length of 4,096 tokens.
Intended Use Cases
- Fine-tuning: This base model is primarily intended for researchers and developers to fine-tune for specific Turkish NLP tasks, such as text generation, classification, or question answering.
- Research & Development: Provides a strong foundation for exploring Turkish language models and developing specialized applications.
Limitations
As a base model, HYZ-01-0.6B-Base is not instruction-tuned and will not reliably follow instructions. Its performance in languages other than Turkish is significantly reduced, and its 0.6B parameter count may limit complex multi-step reasoning. Human verification of outputs is recommended for critical applications.