Name: KaeriJenti/Kaori-34b-v2 API
Brand: Featherless.ai
Price: 25.00 USD
Availability: InStock
Author: KaeriJenti

Kaori-34b-v2: A Contamination-Filtered 34B Language Model

Kaori-34b-v2 is a 34 billion parameter language model developed by Kaeri and Jenti, distinguished by its rigorous approach to data cleanliness during fine-tuning. The model was trained using the LoRA method over 3 epochs on A100 GPUs.

Key Characteristics

Fine-tuning Datasets: Utilizes a blend of Open-Platypus (100%), Dolphin (5%), and OpenOrca (10%) datasets, applying a Supervised Fine-Tuning (SFT) strategy.
Contamination Filtering: A significant focus was placed on preventing data contamination. The training data was carefully similarity-filtered against common benchmark tasks such as GSM8k, ARC, Winogrande, and HellaSwag to ensure robust and unbiased performance evaluation.
Training Framework: Fine-tuned using the LLaMA-Factory framework.

Good For

Applications requiring a large language model with a strong emphasis on clean training data, free from common benchmark contamination.
General language generation and understanding tasks where the integrity of evaluation against standard benchmarks is crucial.

Overview

Kaori-34b-v2: A Contamination-Filtered 34B Language Model

Key Characteristics

Good For

Full Model Card (README)