tokyotech-llm/Llama-3.3-Swallow-70B-v0.4
TEXT GENERATIONConcurrency Cost:4Model Size:70BQuant:FP8Ctx Length:32kPublished:Feb 17, 2025License:llama3.3Architecture:Transformer0.0K Cold
Llama-3.3-Swallow-70B-v0.4 is a 70 billion parameter large language model developed by tokyotech-llm, built by continually pre-training on Meta Llama 3.3. This model significantly enhances Japanese language capabilities while maintaining strong English performance, utilizing approximately 315 billion tokens from Japanese web corpora, Wikipedia, and mathematical/coding content. It is optimized for robust bilingual performance, particularly excelling in Japanese language tasks.
Loading preview...