Overview

Geneva-12B-GCv2-5m is a 12 billion parameter language model developed by Ruben Roy, fine-tuned from the Mistral Nemo Instruct 2407 base model. It was trained for 60 epochs on a single A100 GPU using the Unsloth framework, specifically leveraging the GammaCorpus v2-5m dataset.

Key Characteristics

Base Model: Mistral Nemo Instruct 2407
Parameters: 12 billion
Architecture: 40 layers, 5,120 dimension, 32 heads, 8 GQA kv-heads
Activation Function: SwiGLU
Vocabulary Size: Approximately 128k
Training Data: Fine-tuned on GammaCorpus v2-5m, a dataset of structured and filtered multi-turn conversations.

Primary Use Case

This model is particularly well-suited for conversational AI applications, benefiting from its fine-tuning on the GammaCorpus dataset. It aims to provide competitive performance for dialogue-based tasks, especially where structured and filtered multi-turn interactions are beneficial. Developers can integrate it using the Hugging Face transformers library.

Limitations

As with many language models, Geneva-12B-GCv2-5m may exhibit biases in its generated responses, despite efforts to mitigate them during development.

Overview

Overview

Key Characteristics

Primary Use Case

Limitations

Full Model Card (README)