Name: beomi/Solar-Ko-Recovery-11B API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: beomi

Solar-Ko-Recovery-11B: Enhanced Korean Language Model

Solar-Ko-Recovery-11B is an 11 billion parameter auto-regressive language model developed by Junbum Lee (Beomi). It is built upon an optimized transformer architecture derived from Llama-2, with a primary goal of significantly improving the Korean language capabilities of the original Solar model.

Key Enhancements & Capabilities

Korean Language Recovery: The model was specifically trained to "recover" Solar's performance on Korean by re-arranging embeddings and the LM head.
Expanded Vocabulary: It features an expanded vocabulary (64,000 tokens, up from 32,000 in original Solar) which includes additional Korean and Japanese vocabulary.
Efficient Korean Tokenization: Demonstrates significantly improved tokenization efficiency for Korean text. For example, a common Korean phrase tokenizes into 7 tokens with Solar-Ko-Recovery compared to 26 tokens with SOLAR-10.7B, leading to better context utilization and potentially faster inference for Korean.
Dual-Language Training: Trained on a curated mix of Korean and English corpora to ensure enhanced representation in both languages.
Benchmark Performance: Achieves strong results on Korean-specific benchmarks, including haerae (0.7874 acc_norm), kmmlu_direct (0.4205 exact_match), and various KoBEST tasks (e.g., kobest_boolq 0.9202 acc).

Good for

Applications requiring robust Korean language understanding and generation.
Use cases where efficient tokenization of Korean text is critical.
Developers looking for a performant 11B model with a strong focus on Korean language capabilities.

Overview

Solar-Ko-Recovery-11B: Enhanced Korean Language Model

Key Enhancements & Capabilities

Good for

Full Model Card (README)