Name: SimpleBerry/LLaMA-O1-Base-1127 API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: SimpleBerry

Model Overview

SimpleBerry/LLaMA-O1-Base-1127 is an 8 billion parameter language model, fine-tuned by SimpleBerry from the meta-llama/Llama-3.1-8B-Instruct architecture. This base model was trained on the longcot_pt dataset, suggesting an emphasis on processing and understanding long contexts.

Key Characteristics

Base Model: Derived from Meta's Llama-3.1-8B-Instruct.
Parameter Count: 8 billion parameters.
Training Data: Fine-tuned on the longcot_pt dataset.
Context Length: Supports a context length of 32768 tokens.

Intended Use

This model is designed as a foundational component for further development. It is not recommended for direct, unsupervised usage. Instead, it serves as a base for subsequent supervised training. For direct application, users are advised to consider models like LLaMA-O1-Supervised-1129, which is built upon this base model with additional supervised fine-tuning.

Training Details

The training process involved a learning rate of 5e-05, a batch size of 1 per device across 24 GPUs (totaling 24), and was run for 4 epochs. The optimizer used was ADAMW_TORCH with a cosine learning rate scheduler.

Overview

Model Overview

Key Characteristics

Intended Use

Training Details

Full Model Card (README)