nbeerbower/mistral-nemo-gutenberg-12B-v4

Warm
Public
12B
FP8
32768
License: apache-2.0
Hugging Face
Overview

Model Overview

The nbeerbower/mistral-nemo-gutenberg-12B-v4 is a 12 billion parameter language model developed by nbeerbower. It is based on the TheDrummer/Rocinante-12B-v1 architecture and has been further fine-tuned using the jondurbin/gutenberg-dpo-v0.1 dataset. This fine-tuning process was conducted over 3 epochs on an A100 GPU via Google Colab, utilizing methods similar to those described for ORPO fine-tuning.

Key Characteristics

  • Base Model: TheDrummer/Rocinante-12B-v1
  • Fine-tuning Dataset: jondurbin/gutenberg-dpo-v0.1, suggesting an emphasis on high-quality textual data.
  • Context Length: Supports a substantial context window of 32,768 tokens, enabling processing of longer inputs and generating coherent extended outputs.

Performance Benchmarks

Evaluations on the Open LLM Leaderboard indicate the model's performance across several metrics. While the average score is 19.56, specific results include:

  • IFEval (0-Shot): 23.79
  • BBH (3-Shot): 31.97
  • MATH Lvl 5 (4-Shot): 10.95
  • MMLU-PRO (5-shot): 28.62

These scores provide insight into its capabilities in instruction following, complex reasoning, mathematical problem-solving, and general knowledge.

Potential Use Cases

Given its fine-tuning on a Gutenberg-derived dataset and substantial context length, this model is well-suited for:

  • Advanced text generation: Creating detailed narratives, articles, or long-form content.
  • Content summarization: Processing and condensing extensive documents.
  • Question answering: Handling complex queries that require understanding large contexts.
  • Literary analysis: Tasks involving deep comprehension of textual nuances.