SimpleBerry/LLaMA-O1-Base-1127 is an 8 billion parameter base model developed by SimpleBerry, fine-tuned from Meta's Llama-3.1-8B-Instruct. This model is specifically trained on the longcot_pt dataset, indicating a focus on long context understanding. It is intended as a foundational model for further supervised training, rather than direct application.
Loading preview...
Model Overview
SimpleBerry/LLaMA-O1-Base-1127 is an 8 billion parameter language model, fine-tuned by SimpleBerry from the meta-llama/Llama-3.1-8B-Instruct architecture. This base model was trained on the longcot_pt dataset, suggesting an emphasis on processing and understanding long contexts.
Key Characteristics
- Base Model: Derived from Meta's Llama-3.1-8B-Instruct.
- Parameter Count: 8 billion parameters.
- Training Data: Fine-tuned on the
longcot_ptdataset. - Context Length: Supports a context length of 32768 tokens.
Intended Use
This model is designed as a foundational component for further development. It is not recommended for direct, unsupervised usage. Instead, it serves as a base for subsequent supervised training. For direct application, users are advised to consider models like LLaMA-O1-Supervised-1129, which is built upon this base model with additional supervised fine-tuning.
Training Details
The training process involved a learning rate of 5e-05, a batch size of 1 per device across 24 GPUs (totaling 24), and was run for 4 epochs. The optimizer used was ADAMW_TORCH with a cosine learning rate scheduler.