tzchen07/Gemma2-2B-SFT-X8c-2ep
The tzchen07/Gemma2-2B-SFT-X8c-2ep model is a 2.6 billion parameter language model, fine-tuned from unsloth/gemma-2-2b-it. This model leverages a context length of 8192 tokens and is specifically adapted through supervised fine-tuning on the v1_6_plus_v1_8_plus_v1_6c dataset. It is designed for general language understanding and generation tasks, building upon the Gemma 2 architecture.
Loading preview...
Model Overview
This model, tzchen07/Gemma2-2B-SFT-X8c-2ep, is a 2.6 billion parameter language model derived from the Gemma 2 architecture. It has been specifically fine-tuned from unsloth/gemma-2-2b-it using a supervised fine-tuning (SFT) approach. The training utilized the v1_6_plus_v1_8_plus_v1_6c dataset, enhancing its capabilities for general language tasks.
Key Training Details
The fine-tuning process involved specific hyperparameters to optimize performance:
- Learning Rate: 5e-06
- Batch Size: A
train_batch_sizeof 4 andeval_batch_sizeof 8, with agradient_accumulation_stepsof 16, resulting in atotal_train_batch_sizeof 64. - Optimizer: ADAMW_TORCH with default betas and epsilon.
- Scheduler: Cosine learning rate scheduler with a warmup ratio of 0.1 over 2 epochs.
Intended Use Cases
While specific intended uses and limitations require further information, as a fine-tuned Gemma 2-2B model, it is generally suitable for a range of natural language processing applications where a compact yet capable model is beneficial. Its training on a specific dataset suggests potential strengths in areas covered by that data, making it a candidate for tasks requiring nuanced understanding and generation based on its fine-tuning.