LeoLM/leo-mistral-hessianai-7b-chat Overview
LeoLM/leo-mistral-hessianai-7b-chat is a 7 billion parameter German chat model developed by LAION and HessianAI. It is built on the Mistral architecture and represents an open, commercially available German Foundation Language Model. The model extends Mistral's capabilities into German through continued pretraining on a large corpus of German-language and locality-specific text, utilizing a compute grant at HessianAI's supercomputer 42.
Key Capabilities & Performance
This model is fine-tuned on a selection of German instruction datasets, demonstrating strong performance in German-language tasks. According to MT-Bench-DE scores, it particularly excels in:
- Writing: 6.8
- Roleplay: 6.35
- Humanities: 8.25
However, it shows limitations in mathematical and advanced reasoning tasks, with scores of 2.75 for math and 3.3 for reasoning. The model supports both English and German languages and is released under the Apache 2.0 license.
Training Details
The model was fine-tuned over 4 epochs with a global batch size of 256 and a learning rate of 1e-5. The training datasets include subsets of OpenAssistant/OASST-DE, FreedomIntelligence/evol-instruct-deutsch, FreedomIntelligence/alpaca-gpt4-deutsch, LeoLM/OpenSchnabeltier, LeoLM/German_Poems, and LeoLM/German_Songs, totaling over 132,000 samples and 67 million tokens.