Kendamarron/Tokara-0.5B-v0.1
Kendamarron/Tokara-0.5B-v0.1 is a 0.6 billion parameter causal language model, continuously pre-trained by Kendamarron on 5 billion Japanese and English tokens, building upon the Qwen1.5-0.5B architecture. This model is specifically optimized for stable Japanese language generation, making it suitable for applications requiring reliable Japanese text output. Despite a slight reduction in benchmark scores compared to its base model, it demonstrates improved consistency in Japanese text production.
Loading preview...
Overview
Kendamarron/Tokara-0.5B-v0.1 is a 0.6 billion parameter causal language model, developed by Kendamarron. It is a continuation of the Qwen/Qwen1.5-0.5B model, having undergone further pre-training with 5 billion Japanese and English tokens. The model is designed to provide more stable Japanese output compared to its base model, addressing a common need for reliable Japanese text generation in smaller language models.
Key Characteristics
- Base Model: Qwen1.5-0.5B architecture.
- Training Data: Continuously pre-trained on 5 billion Japanese and English tokens.
- Context Length: Supports a context length of 32768 tokens.
- Japanese Stability: Engineered for more consistent and stable Japanese language generation.
Benchmarks
The model's performance was evaluated using Stability-AI/lm-evaluation-harness across three Japanese-specific tasks. While its scores are slightly lower than the base Qwen1.5-0.5B model in some areas, the focus was on improving output stability rather than raw benchmark scores.
| Model | jsquad(1-shot) | jcommonsenseqa(1-shot) | jnli(1-shot) |
|---|---|---|---|
| Kendamarron/Tokara-0.5B-v0.1 | 26.4295 | 0.2663 | 0.5509 |
| Qwen/Qwen1.5-0.5B | 31.3597 | 0.2556 | 0.5534 |
Use Cases
This model is particularly well-suited for applications where consistent and reliable Japanese text generation is crucial, especially within resource-constrained environments that benefit from a smaller parameter count. Its enhanced stability in Japanese output makes it a practical choice for various Japanese NLP tasks.