rubenroy/Geneva-12B-GCv2-5m

Hugging Face
TEXT GENERATIONConcurrency Cost:1Model Size:12BQuant:FP8Ctx Length:32kPublished:Feb 1, 2025License:apache-2.0Architecture:Transformer0.0K Open Weights Warm

Geneva-12B-GCv2-5m is a 12 billion parameter language model developed by Ruben Roy, fine-tuned from Mistral's Mistral Nemo Instruct 2407. This model is specifically optimized using the GammaCorpus v2-5m dataset, which consists of structured and filtered multi-turn conversations. It is designed to offer strong performance for conversational AI tasks within its size class, leveraging its specialized training data.

Loading preview...

Overview

Geneva-12B-GCv2-5m is a 12 billion parameter language model developed by Ruben Roy, fine-tuned from the Mistral Nemo Instruct 2407 base model. It was trained for 60 epochs on a single A100 GPU using the Unsloth framework, specifically leveraging the GammaCorpus v2-5m dataset.

Key Characteristics

  • Base Model: Mistral Nemo Instruct 2407
  • Parameters: 12 billion
  • Architecture: 40 layers, 5,120 dimension, 32 heads, 8 GQA kv-heads
  • Activation Function: SwiGLU
  • Vocabulary Size: Approximately 128k
  • Training Data: Fine-tuned on GammaCorpus v2-5m, a dataset of structured and filtered multi-turn conversations.

Primary Use Case

This model is particularly well-suited for conversational AI applications, benefiting from its fine-tuning on the GammaCorpus dataset. It aims to provide competitive performance for dialogue-based tasks, especially where structured and filtered multi-turn interactions are beneficial. Developers can integrate it using the Hugging Face transformers library.

Limitations

As with many language models, Geneva-12B-GCv2-5m may exhibit biases in its generated responses, despite efforts to mitigate them during development.