Locutusque/Mistral-7B-SFT

TEXT GENERATIONConcurrency Cost:1Model Size:7BQuant:FP8Ctx Length:8kLicense:cc-by-nc-4.0Architecture:Transformer Open Weights Cold

Locutusque/Mistral-7B-SFT is a 7 billion parameter Mistral-based language model developed by Locutusque. This model is a general-purpose assistant, fine-tuned to evaluate the effectiveness of various datasets in language model training. With an 8192-token context length, it aims to identify optimal dataset combinations for fine-tuning.

Loading preview...

Model Overview

Locutusque/Mistral-7B-SFT is a 7 billion parameter language model built on the Mistral architecture. Developed by Locutusque, its primary purpose is to serve as a general-purpose assistant while also acting as an experimental platform to determine the most effective datasets for fine-tuning language models.

Training Details

The model underwent a full fine-tuning process utilizing 8 TPU V3s. The specific datasets used for training are listed on the model's page. It is important to note that the model experienced exploding gradients early in its training, which may impact its overall performance.

Key Characteristics

  • Architecture: Mistral-7B
  • Parameter Count: 7 Billion
  • Context Length: 8192 tokens
  • Training Goal: Dataset efficacy evaluation for fine-tuning

Potential Considerations

Due to the reported exploding gradients during early training, users should be aware that the model's performance may not be fully optimized or guaranteed across all tasks.

Popular Sampler Settings

Top 3 parameter combinations used by Featherless users for this model. Click a tab to see each config.

temperature
top_p
top_k
frequency_penalty
presence_penalty
repetition_penalty
min_p