Kendamarron/Tokara-0.5B-v0.1

Hugging Face
TEXT GENERATIONConcurrency Cost:1Model Size:0.6BQuant:BF16Ctx Length:32kPublished:May 6, 2024License:tongyi-qianwen-researchArchitecture:Transformer0.0K Warm

Kendamarron/Tokara-0.5B-v0.1 is a 0.6 billion parameter causal language model, continuously pre-trained by Kendamarron on 5 billion Japanese and English tokens, building upon the Qwen1.5-0.5B architecture. This model is specifically optimized for stable Japanese language generation, making it suitable for applications requiring reliable Japanese text output. Despite a slight reduction in benchmark scores compared to its base model, it demonstrates improved consistency in Japanese text production.

Loading preview...

Overview

Kendamarron/Tokara-0.5B-v0.1 is a 0.6 billion parameter causal language model, developed by Kendamarron. It is a continuation of the Qwen/Qwen1.5-0.5B model, having undergone further pre-training with 5 billion Japanese and English tokens. The model is designed to provide more stable Japanese output compared to its base model, addressing a common need for reliable Japanese text generation in smaller language models.

Key Characteristics

  • Base Model: Qwen1.5-0.5B architecture.
  • Training Data: Continuously pre-trained on 5 billion Japanese and English tokens.
  • Context Length: Supports a context length of 32768 tokens.
  • Japanese Stability: Engineered for more consistent and stable Japanese language generation.

Benchmarks

The model's performance was evaluated using Stability-AI/lm-evaluation-harness across three Japanese-specific tasks. While its scores are slightly lower than the base Qwen1.5-0.5B model in some areas, the focus was on improving output stability rather than raw benchmark scores.

Model jsquad(1-shot) jcommonsenseqa(1-shot) jnli(1-shot)
Kendamarron/Tokara-0.5B-v0.1 26.4295 0.2663 0.5509
Qwen/Qwen1.5-0.5B 31.3597 0.2556 0.5534

Use Cases

This model is particularly well-suited for applications where consistent and reliable Japanese text generation is crucial, especially within resource-constrained environments that benefit from a smaller parameter count. Its enhanced stability in Japanese output makes it a practical choice for various Japanese NLP tasks.