rbelanec/train_mnli_42_1773765555

Hugging Face
TEXT GENERATIONConcurrency Cost:1Model Size:1BQuant:BF16Ctx Length:32kPublished:Mar 17, 2026License:llama3.2Architecture:Transformer Warm

The rbelanec/train_mnli_42_1773765555 is a 1 billion parameter language model fine-tuned from meta-llama/Llama-3.2-1B-Instruct. This model is specifically optimized for Natural Language Inference (NLI) tasks, having been trained on the MNLI dataset. It demonstrates a validation loss of 0.2161, indicating its proficiency in classifying textual entailment relationships. Its primary strength lies in understanding and categorizing logical relationships between sentences.

Loading preview...

Model Overview

The rbelanec/train_mnli_42_1773765555 is a 1 billion parameter language model derived from the meta-llama/Llama-3.2-1B-Instruct architecture. It has been specifically fine-tuned on the MNLI (Multi-Genre Natural Language Inference) dataset to excel at natural language inference tasks.

Key Capabilities

  • Natural Language Inference (NLI): The model is specialized in determining the logical relationship between a premise and a hypothesis, classifying them as entailment, contradiction, or neutral.
  • Performance: Achieved a validation loss of 0.2161 during training, indicating strong performance on the MNLI evaluation set.
  • Training Details:
    • Trained for 5 epochs with a learning rate of 5e-05.
    • Utilized AdamW optimizer with a cosine learning rate scheduler.
    • Processed over 191 million input tokens during its training run.

When to Use This Model

This model is particularly well-suited for applications requiring robust natural language inference capabilities. Consider using it for:

  • Textual Entailment Classification: Identifying logical relationships between sentences.
  • Fact Verification: Assessing the consistency of claims against evidence.
  • Question Answering Systems: Improving the understanding of question-answer relationships.

Limitations

As a specialized model, its primary strength is NLI. For broader generative tasks or other specific NLP applications, its performance may not match models fine-tuned for those particular use cases.