ubitech-edg/mistral-12b-cpt-sft
ubitech-edg/mistral-12b-cpt-sft is a 12 billion parameter causal language model developed by ubitech-edg, built upon the mistral-12b-cpt base model. It leverages a two-stage LoRA fine-tuning process, combining continual pretraining (CPT) for extended general knowledge and supervised fine-tuning (SFT) for enhanced instruction-following on synthetic QA. This approach improves coherence, factual recall, and reasoning, making it suitable for applications requiring robust question-answering and general text generation.
Loading preview...
Overview
ubitech-edg/mistral-12b-cpt-sft is a 12 billion parameter causal language model that integrates continual pretraining (CPT) and supervised fine-tuning (SFT). This two-stage LoRA fine-tuning process aims to enhance the model's general knowledge and instruction-following capabilities, particularly for question-answering tasks.
Key Capabilities & Training
- Two-Stage Fine-Tuning: The model first undergoes CPT to expand its general knowledge using diverse domain-specific datasets like
arxiv.jsonl,gov.jsonl,news.jsonl, andwiki.jsonl. Subsequently, SFT is applied usingaxolotl_deduplicated_synthetic_qa.jsonlto improve its ability to follow instructions and generate coherent, factual responses. - LoRA Efficiency: The fine-tuning utilizes an 8-bit LoRA adapter with specific hyperparameters (r=16, alpha=32, dropout=0.05) targeting
q_proj,k_proj,v_proj, ando_projlayers, ensuring efficient adaptation. - Hardware & Framework: Training was conducted on Leonardo EuroHPC, utilizing 8 × 2 × A100 64 GB GPUs with Axolotl, DeepSpeed, PyTorch 2.5.1, and CUDA 12.1.
- Context Length: The model supports a sequence length of 2048 tokens.
Use Cases
This model is well-suited for applications requiring improved coherence, factual recall, and reasoning, especially in question-answering scenarios, due to its specialized two-stage training approach.