VAGOsolutions/Llama-3.1-SauerkrautLM-70b-Instruct

Hugging Face
TEXT GENERATIONConcurrency Cost:4Model Size:70BQuant:FP8Ctx Length:32kPublished:Jul 29, 2024License:llama3.1Architecture:Transformer0.0K Warm

VAGOsolutions/Llama-3.1-SauerkrautLM-70b-Instruct is a 70 billion parameter instruction-tuned model based on Meta-Llama-3.1-70B-Instruct, developed by VAGO solutions. It was fine-tuned using Spectrum Fine-Tuning on a German-English dataset, targeting 15% of the layers. This approach significantly enhances multilingual capabilities, demonstrating improved performance across German, English, Arabic, Italian, French, Spanish, Dutch, and Portuguese through cross-lingual knowledge transfer.

Loading preview...

VAGO solutions Llama-3.1-SauerkrautLM-70b-Instruct Overview

VAGO solutions' Llama-3.1-SauerkrautLM-70b-Instruct is a 70 billion parameter instruction-tuned model, building upon Meta's Llama-3.1-70B-Instruct. Its core innovation lies in its fine-tuning methodology, utilizing Spectrum Fine-Tuning on only 15% of the model's layers. This resource-efficient approach aims to demonstrate significant capability enhancements with reduced computational overhead compared to traditional fine-tuning.

Key Capabilities & Differentiators

  • Multilingual Enhancement: The model was fine-tuned using a unique "German-English Sauerkraut Mix v2" dataset, which facilitated efficient cross-lingual transfer learning. This has led to improved performance not only in German and English but also in Arabic, Italian, French, Spanish, Dutch, and Portuguese.
  • Resource-Efficient Fine-Tuning: By targeting only 15% of the layers with Spectrum Fine-Tuning, VAGO solutions showcases a method for substantially improving a large language model's capabilities while preserving much of its original knowledge and using fewer resources.
  • Cross-Lingual Transfer: The Sauerkraut Mix v2 dataset, composed of meticulously selected high-quality German and English data, serves as a foundation for transferring linguistic knowledge to other languages, enabling multilingual improvements from a bilingual base.

Use Cases & Strengths

This model is particularly well-suited for applications requiring strong multilingual understanding and generation, especially in the languages it has been optimized for. Its development highlights a cost-effective strategy for creating powerful multilingual LLMs without extensive language-specific training data for every target language.

Popular Sampler Settings

Top 3 parameter combinations used by Featherless users for this model. Click a tab to see each config.

temperature
top_p
top_k
frequency_penalty
presence_penalty
repetition_penalty
min_p