nbeerbower/mistral-nemo-gutenberg-12B
nbeerbower/mistral-nemo-gutenberg-12B is a 12 billion parameter language model, fine-tuned from mistralai/Mistral-Nemo-Instruct-2407. This model was fine-tuned on the jondurbin/gutenberg-dpo-v0.1 dataset, focusing on instruction following and general language tasks. It features a 32768 token context length and is suitable for applications requiring a balance of performance and efficiency.
Loading preview...
Model Overview
nbeerbower/mistral-nemo-gutenberg-12B is a 12 billion parameter language model derived from the mistralai/Mistral-Nemo-Instruct-2407 base model. It has been fine-tuned using an A100 GPU on Google Colab for one epoch, leveraging the jondurbin/gutenberg-dpo-v0.1 dataset. This fine-tuning process aims to enhance its instruction-following capabilities and general language understanding.
Key Characteristics
- Base Model: Mistral-Nemo-Instruct-2407 architecture.
- Parameter Count: 12 billion parameters.
- Context Length: Supports a substantial context window of 32768 tokens.
- Training Data: Fine-tuned on the
gutenberg-dpo-v0.1dataset, which likely contributes to its text generation and comprehension abilities.
Performance Metrics
Evaluated on the Open LLM Leaderboard, the model achieved an average score of 20.82. Specific benchmark results include:
- IFEval (0-Shot): 35.04
- BBH (3-Shot): 32.43
- MMLU-PRO (5-shot): 28.47
Use Cases
This model is suitable for developers looking for a moderately sized language model with good instruction-following capabilities, particularly for tasks that benefit from a large context window. Its fine-tuning on a specific dataset suggests potential strengths in areas related to the dataset's content.
Top 3 parameter combinations used by Featherless users for this model. Click a tab to see each config.