Locutusque/Mistral-7B-SFT
Locutusque/Mistral-7B-SFT is a 7 billion parameter Mistral-based language model developed by Locutusque. This model is a general-purpose assistant, fine-tuned to evaluate the effectiveness of various datasets in language model training. With an 8192-token context length, it aims to identify optimal dataset combinations for fine-tuning.
Loading preview...
Model Overview
Locutusque/Mistral-7B-SFT is a 7 billion parameter language model built on the Mistral architecture. Developed by Locutusque, its primary purpose is to serve as a general-purpose assistant while also acting as an experimental platform to determine the most effective datasets for fine-tuning language models.
Training Details
The model underwent a full fine-tuning process utilizing 8 TPU V3s. The specific datasets used for training are listed on the model's page. It is important to note that the model experienced exploding gradients early in its training, which may impact its overall performance.
Key Characteristics
- Architecture: Mistral-7B
- Parameter Count: 7 Billion
- Context Length: 8192 tokens
- Training Goal: Dataset efficacy evaluation for fine-tuning
Potential Considerations
Due to the reported exploding gradients during early training, users should be aware that the model's performance may not be fully optimized or guaranteed across all tasks.
Top 3 parameter combinations used by Featherless users for this model. Click a tab to see each config.