Name: boradorish/llama3-3B-sft API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: boradorish

Overview

This model, boradorish/llama3-3B-sft, is a fine-tuned variant of Meta's Llama-3.2-3B-Instruct, specifically optimized for reasoning tasks. With 3.2 billion parameters and a substantial 32768-token context length, it aims to provide enhanced performance in scenarios requiring logical deduction and problem-solving.

Key Characteristics

Base Model: Fine-tuned from meta-llama/Llama-3.2-3B-Instruct.
Parameter Count: 3.2 billion parameters.
Context Length: Supports a 32768-token context window.
Specialization: Fine-tuned on the sunny_reasoning dataset, indicating a focus on improving reasoning capabilities.

Training Details

The model was trained using specific hyperparameters to achieve its specialized performance:

Learning Rate: 4e-05
Batch Size: A total training batch size of 64 (4 per device with 8 gradient accumulation steps on 2 GPUs).
Optimizer: ADAMW_TORCH_FUSED with betas=(0.9, 0.999) and epsilon=1e-08.
Scheduler: Cosine learning rate scheduler with a 0.1 warmup ratio over 3 epochs.

Intended Use Cases

Given its fine-tuning on a reasoning dataset, this model is particularly well-suited for applications that demand strong logical inference and problem-solving abilities. Developers looking for a compact yet capable model for reasoning-intensive tasks should consider this offering.

Overview

Overview

Key Characteristics

Training Details

Intended Use Cases

Full Model Card (README)