cxrbon16/turkish-llama-MSFT-0.7-ngram-banned
The cxrbon16/turkish-llama-MSFT-0.7-ngram-banned model is an 8 billion parameter language model, fine-tuned from ytu-ce-cosmos/Turkish-Llama-8b-v0.1. It has a context length of 8192 tokens and was trained with a learning rate of 2e-05 over 2 epochs. This model is specifically adapted for Turkish language tasks, building upon an existing Turkish Llama base.
Loading preview...
Model Overview
This model, cxrbon16/turkish-llama-MSFT-0.7-ngram-banned, is an 8 billion parameter language model derived from the ytu-ce-cosmos/Turkish-Llama-8b-v0.1 base. It was fine-tuned with a learning rate of 2e-05 over 2 epochs, utilizing a batch size of 2 and gradient accumulation steps of 16, resulting in an effective total batch size of 32. The training process achieved a final validation loss of 0.5518.
Key Characteristics
- Base Model: Fine-tuned from
ytu-ce-cosmos/Turkish-Llama-8b-v0.1. - Parameter Count: 8 billion parameters.
- Context Length: Supports an 8192-token context window.
- Training Hyperparameters: Employed AdamW optimizer, a linear learning rate scheduler with 0.03 warmup steps, and 2 training epochs.
Intended Use
This model is suitable for applications requiring a Turkish-language understanding and generation, leveraging its fine-tuned capabilities from a Turkish Llama base. Specific use cases are not detailed in the original documentation, suggesting a general-purpose application within the Turkish linguistic domain.