rubenroy/Geneva-12B-GCv2-5m
Geneva-12B-GCv2-5m is a 12 billion parameter language model developed by Ruben Roy, fine-tuned from Mistral's Mistral Nemo Instruct 2407. This model is specifically optimized using the GammaCorpus v2-5m dataset, which consists of structured and filtered multi-turn conversations. It is designed to offer strong performance for conversational AI tasks within its size class, leveraging its specialized training data.
Loading preview...
Overview
Geneva-12B-GCv2-5m is a 12 billion parameter language model developed by Ruben Roy, fine-tuned from the Mistral Nemo Instruct 2407 base model. It was trained for 60 epochs on a single A100 GPU using the Unsloth framework, specifically leveraging the GammaCorpus v2-5m dataset.
Key Characteristics
- Base Model: Mistral Nemo Instruct 2407
- Parameters: 12 billion
- Architecture: 40 layers, 5,120 dimension, 32 heads, 8 GQA kv-heads
- Activation Function: SwiGLU
- Vocabulary Size: Approximately 128k
- Training Data: Fine-tuned on GammaCorpus v2-5m, a dataset of structured and filtered multi-turn conversations.
Primary Use Case
This model is particularly well-suited for conversational AI applications, benefiting from its fine-tuning on the GammaCorpus dataset. It aims to provide competitive performance for dialogue-based tasks, especially where structured and filtered multi-turn interactions are beneficial. Developers can integrate it using the Hugging Face transformers library.
Limitations
As with many language models, Geneva-12B-GCv2-5m may exhibit biases in its generated responses, despite efforts to mitigate them during development.