rbelanec/train_mnli_42_1776331408

TEXT GENERATIONConcurrency Cost:1Model Size:1BQuant:BF16Ctx Length:32kPublished:Apr 16, 2026License:llama3.2Architecture:Transformer Cold

The rbelanec/train_mnli_42_1776331408 is a 1 billion parameter language model fine-tuned from Meta Llama-3.2-1B-Instruct. This model is specifically optimized for Natural Language Inference (NLI) tasks, having been trained on the MNLI dataset. It demonstrates a validation loss of 0.1219 on the evaluation set, making it suitable for classification tasks requiring entailment, contradiction, or neutral relationship identification between text pairs.

Loading preview...

Model Overview

This model, rbelanec/train_mnli_42_1776331408, is a 1 billion parameter language model derived from the meta-llama/Llama-3.2-1B-Instruct architecture. It has undergone specific fine-tuning on the MNLI (Multi-Genre Natural Language Inference) dataset, which focuses on determining the relationship between a premise and a hypothesis (entailment, contradiction, or neutral).

Key Capabilities

  • Natural Language Inference (NLI): Specialized in classifying the logical relationship between two sentences.
  • Fine-tuned Performance: Achieved a validation loss of 0.1219 on the evaluation set after training, indicating proficiency in its target task.
  • Efficient Size: As a 1B parameter model, it offers a balance between performance on NLI tasks and computational efficiency.

Training Details

The model was trained for 5 epochs with a learning rate of 5e-06, using a cosine learning rate scheduler with a 0.1 warmup ratio. The training involved processing approximately 191 million input tokens. Key hyperparameters included a batch size of 8 for both training and evaluation, and the AdamW optimizer.

Good For

  • Applications requiring precise classification of textual relationships (entailment, contradiction, neutrality).
  • Research and development in natural language understanding, particularly for NLI benchmarks.
  • Deployment in scenarios where a smaller, specialized model for NLI is preferred over larger, general-purpose LLMs.