UWNSL/Llama3.1-3B-Instruct_Mix-Long

Hugging Face
TEXT GENERATIONConcurrency Cost:1Model Size:3.2BQuant:BF16Ctx Length:32kPublished:Feb 24, 2025License:otherArchitecture:Transformer Warm

UWNSL/Llama3.1-3B-Instruct_Mix-Long is a 3.2 billion parameter instruction-tuned causal language model, fine-tuned from Meta's Llama-3.2-3B-Instruct. This model features an extended context length of 32768 tokens, making it suitable for tasks requiring processing of longer inputs. It is optimized for general instruction-following tasks, leveraging its fine-tuning on the Mix-Long_long_0.2_short_0.8 dataset.

Loading preview...

UWNSL/Llama3.1-3B-Instruct_Mix-Long Overview

This model is a fine-tuned variant of the Meta Llama-3.2-3B-Instruct architecture, developed by UWNSL. It is a 3.2 billion parameter instruction-following language model, distinguished by its 32768-token context window, which allows for processing significantly longer inputs compared to many base models.

Key Characteristics

  • Base Model: Fine-tuned from meta-llama/Llama-3.2-3B-Instruct.
  • Parameter Count: 3.2 billion parameters.
  • Context Length: Supports an extended context of 32768 tokens.
  • Training Data: Fine-tuned on the Mix-Long_long_0.2_short_0.8 dataset.
  • Training Loss: Achieved a loss of 0.2421 on the evaluation set during training.

Intended Use Cases

Given its instruction-tuned nature and extended context window, this model is well-suited for applications requiring:

  • General instruction following: Responding to a wide range of prompts and commands.
  • Long-form text processing: Summarization, analysis, or generation of documents, articles, or code with substantial length.
  • Conversational AI: Maintaining context over extended dialogues.

Training Details

The model was trained using a learning rate of 1e-05, a total batch size of 80 (with gradient accumulation), and the AdamW optimizer over 2 epochs. The training utilized Transformers 4.46.1 and Pytorch 2.6.0+cu124.