Name: UWNSL/Llama3.1-3B-Instruct_Mix-Long API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: UWNSL

UWNSL/Llama3.1-3B-Instruct_Mix-Long Overview

This model is a fine-tuned variant of the Meta Llama-3.2-3B-Instruct architecture, developed by UWNSL. It is a 3.2 billion parameter instruction-following language model, distinguished by its 32768-token context window, which allows for processing significantly longer inputs compared to many base models.

Key Characteristics

Base Model: Fine-tuned from meta-llama/Llama-3.2-3B-Instruct.
Parameter Count: 3.2 billion parameters.
Context Length: Supports an extended context of 32768 tokens.
Training Data: Fine-tuned on the Mix-Long_long_0.2_short_0.8 dataset.
Training Loss: Achieved a loss of 0.2421 on the evaluation set during training.

Intended Use Cases

Given its instruction-tuned nature and extended context window, this model is well-suited for applications requiring:

General instruction following: Responding to a wide range of prompts and commands.
Long-form text processing: Summarization, analysis, or generation of documents, articles, or code with substantial length.
Conversational AI: Maintaining context over extended dialogues.

Training Details

The model was trained using a learning rate of 1e-05, a total batch size of 80 (with gradient accumulation), and the AdamW optimizer over 2 epochs. The training utilized Transformers 4.46.1 and Pytorch 2.6.0+cu124.

Overview

UWNSL/Llama3.1-3B-Instruct_Mix-Long Overview

Key Characteristics

Intended Use Cases

Training Details

Full Model Card (README)