lightblue/ao-karasu-72B

Cold
Public
72.3B
FP8
32768
Mar 11, 2024
Hugging Face
Overview

lightblue/ao-karasu-72B Overview

The lightblue/ao-karasu-72B is a substantial 72.3 billion parameter causal language model with a 32768-token context window, developed by lightblue. It is specifically designed and trained for Japanese language tasks, building upon the foundation of previous models like Qarasu.

Key Training Details

The model's training involved a unique dataset of approximately 20 million characters, sampled from a larger pool of over 1.1 billion characters. This dataset composition is a key differentiator:

  • ~450 million characters from Wikipedia-based QA (similar to Qarasu).
  • ~200 million characters from technical blogs (new addition).
  • ~200 million characters from Japanese QA site answers (new addition).
  • ~100 million characters from LLM-generated prompts and responses (similar to Qarasu).
  • ~70 million characters from news articles (new addition).

This diverse and specialized Japanese dataset aims to enhance the model's understanding and generation capabilities for various Japanese text types. Training was conducted for approximately one day on A100 (80GB) GPUs.

Recommended Usage

For optimal performance, running ao-karasu-72B is recommended on environments with at least 4 A100 GPUs. The model supports standard inference via transformers and vLLM libraries, with provided code examples for both. It is particularly well-suited for applications requiring robust Japanese language processing, such as advanced AI assistants, content generation, and question-answering systems in Japanese.