Name: ubitech-edg/mistral-12b-cpt-sft API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: ubitech-edg

Overview

ubitech-edg/mistral-12b-cpt-sft is a 12 billion parameter causal language model that integrates continual pretraining (CPT) and supervised fine-tuning (SFT). This two-stage LoRA fine-tuning process aims to enhance the model's general knowledge and instruction-following capabilities, particularly for question-answering tasks.

Key Capabilities & Training

Two-Stage Fine-Tuning: The model first undergoes CPT to expand its general knowledge using diverse domain-specific datasets like arxiv.jsonl, gov.jsonl, news.jsonl, and wiki.jsonl. Subsequently, SFT is applied using axolotl_deduplicated_synthetic_qa.jsonl to improve its ability to follow instructions and generate coherent, factual responses.
LoRA Efficiency: The fine-tuning utilizes an 8-bit LoRA adapter with specific hyperparameters (r=16, alpha=32, dropout=0.05) targeting q_proj, k_proj, v_proj, and o_proj layers, ensuring efficient adaptation.
Hardware & Framework: Training was conducted on Leonardo EuroHPC, utilizing 8 × 2 × A100 64 GB GPUs with Axolotl, DeepSpeed, PyTorch 2.5.1, and CUDA 12.1.
Context Length: The model supports a sequence length of 2048 tokens.

Use Cases

This model is well-suited for applications requiring improved coherence, factual recall, and reasoning, especially in question-answering scenarios, due to its specialized two-stage training approach.

Overview

Overview

Key Capabilities & Training

Use Cases

Full Model Card (README)