rbelanec/train_mrpc_42_1774791061
The rbelanec/train_mrpc_42_1774791061 model is a 1 billion parameter language model fine-tuned by rbelanec. It is based on the meta-llama/Llama-3.2-1B-Instruct architecture and specifically optimized for the MRPC (Microsoft Research Paraphrase Corpus) dataset. This model is designed for tasks requiring paraphrase detection, demonstrating a validation loss of 0.1740 on the evaluation set.
Loading preview...
Model Overview
This model, rbelanec/train_mrpc_42_1774791061, is a fine-tuned version of the meta-llama/Llama-3.2-1B-Instruct architecture, developed by rbelanec. It has 1 billion parameters and was specifically trained on the MRPC (Microsoft Research Paraphrase Corpus) dataset.
Key Capabilities
- Paraphrase Detection: The model is optimized for identifying semantic equivalence between sentences, as evidenced by its training on the MRPC dataset.
- Performance: Achieved a validation loss of 0.1740 on the evaluation set during training.
Training Details
The model was trained for 5 epochs using a learning rate of 5e-05 and a batch size of 8. The optimizer used was ADAMW_TORCH with a cosine learning rate scheduler. The training process involved approximately 1.78 million input tokens seen.
Intended Use Cases
This model is primarily suited for tasks requiring semantic similarity analysis and paraphrase identification, particularly within domains similar to the MRPC dataset. Its compact size (1B parameters) makes it suitable for applications where computational resources are a consideration.