sbintuitions/sarashina2-70b

TEXT GENERATIONConcurrency Cost:4Model Size:70BQuant:FP8Ctx Length:8kPublished:Aug 6, 2024License:mitArchitecture:Transformer0.0K Open Weights Cold

Sarashina2-70B is a 70 billion parameter causal language model developed by SB Intuitions, built on the Llama2 architecture with a RoPE position type. Trained on 2.1 trillion tokens, including a significant portion of Japanese Common Crawl data and English SlimPajama, it features a 102400-token vocabulary. This model is designed for general language tasks, with a particular emphasis on Japanese language processing, and has an 8192-token context length.

Loading preview...

Sarashina2-70B: A Llama2-based Japanese-centric LLM

Sarashina2-70B is a large language model developed by SB Intuitions, leveraging the Llama2 architecture. This 70 billion parameter model is characterized by its RoPE position type and a substantial 102400-token vocabulary.

Key Characteristics

  • Architecture: Based on the robust Llama2 framework, providing a strong foundation for language understanding and generation.
  • Training Data: Trained on an extensive 2.1 trillion tokens, comprising a significant 1 trillion tokens of Japanese Common Crawl data, processed with CCNet and HojiChar for cleaning, alongside English documents from SlimPajama (excluding books3).
  • Tokenization: Utilizes a sentencepiece tokenizer with a unigram language model and byte-fallback, designed to directly process raw sentences without pre-tokenization for Japanese.
  • Scalability: Part of a family of models, with 7B, 13B, and 70B parameter versions, all sharing the same training token count and vocabulary size.

Considerations for Use

It is important to note that Sarashina2-70B has not been instruction-tuned. Consequently, it may produce irrelevant, inaccurate, or biased outputs. Developers are advised to fine-tune the model for specific applications, incorporating human preferences and safety considerations, before deployment.