SimpleBerry/LLaMA-O1-Base-1127
TEXT GENERATIONConcurrency Cost:1Model Size:8BQuant:FP8Ctx Length:32kPublished:Dec 2, 2024License:otherArchitecture:Transformer0.0K Cold

SimpleBerry/LLaMA-O1-Base-1127 is an 8 billion parameter base model developed by SimpleBerry, fine-tuned from Meta's Llama-3.1-8B-Instruct. This model is specifically trained on the longcot_pt dataset, indicating a focus on long context understanding. It is intended as a foundational model for further supervised training, rather than direct application.

Loading preview...

Model Overview

SimpleBerry/LLaMA-O1-Base-1127 is an 8 billion parameter language model, fine-tuned by SimpleBerry from the meta-llama/Llama-3.1-8B-Instruct architecture. This base model was trained on the longcot_pt dataset, suggesting an emphasis on processing and understanding long contexts.

Key Characteristics

  • Base Model: Derived from Meta's Llama-3.1-8B-Instruct.
  • Parameter Count: 8 billion parameters.
  • Training Data: Fine-tuned on the longcot_pt dataset.
  • Context Length: Supports a context length of 32768 tokens.

Intended Use

This model is designed as a foundational component for further development. It is not recommended for direct, unsupervised usage. Instead, it serves as a base for subsequent supervised training. For direct application, users are advised to consider models like LLaMA-O1-Supervised-1129, which is built upon this base model with additional supervised fine-tuning.

Training Details

The training process involved a learning rate of 5e-05, a batch size of 1 per device across 24 GPUs (totaling 24), and was run for 4 epochs. The optimizer used was ADAMW_TORCH with a cosine learning rate scheduler.