rubenroy/Zurich-14B-GCv2-5m

Hugging Face
TEXT GENERATIONConcurrency Cost:1Model Size:14.8BQuant:FP8Ctx Length:32kPublished:Jan 31, 2025License:apache-2.0Architecture:Transformer0.0K Open Weights Warm

rubenroy/Zurich-14B-GCv2-5m is a 14.7 billion parameter causal language model, fine-tuned by Ruben Roy from Alibaba's Qwen 2.5 14B Instruct base model. It utilizes a Transformer architecture with RoPE, SwiGLU, RMSNorm, and Attention QKV bias. This model is specifically fine-tuned on the GammaCorpus v2-5m dataset, which consists of structured and filtered multi-turn conversations, aiming to outperform similarly sized models in conversational tasks.

Loading preview...

Overview

rubenroy/Zurich-14B-GCv2-5m is a 14.7 billion parameter causal language model developed by Ruben Roy. It is a fine-tuned version of Alibaba's Qwen 2.5 14B Instruct model, leveraging its Transformer architecture which includes RoPE, SwiGLU, RMSNorm, and Attention QKV bias, with 48 layers and 40 attention heads for Q and 8 for KV.

Key Differentiator: GammaCorpus v2-5m Fine-tuning

This model's primary distinction is its fine-tuning on the GammaCorpus v2-5m dataset. GammaCorpus is a collection of structured and filtered multi-turn conversations, designed to enhance conversational capabilities. The fine-tuning process involved 60 epochs over approximately 90 minutes using an A100 GPU and the Unsloth framework.

Intended Use

Zurich-14B-GCv2-5m is designed to excel in conversational AI tasks, particularly those benefiting from structured, multi-turn dialogue. Its fine-tuning on GammaCorpus aims to provide robust performance in generating coherent and contextually relevant responses in interactive scenarios. Users should be aware of potential biases, as efforts to mitigate them are ongoing.

Licensing

The model is released under the Apache 2.0 License.