Model Overview
EEG123/subject1-test1 is a 1 billion parameter language model, fine-tuned from the meta-llama/Llama-3.2-3B architecture. This model has undergone specialized training on the EEG123/DE_subject_1 dataset, resulting in a reported evaluation loss of 1.5612.
Key Characteristics
- Base Model: Fine-tuned from
meta-llama/Llama-3.2-3B. - Parameter Count: 1 billion parameters.
- Context Length: Supports a context window of 32768 tokens.
- Training Data: Specialized on the
EEG123/DE_subject_1 dataset. - Performance: Achieved a validation loss of 1.5612 after 3 epochs of training.
Training Details
The model was trained using a learning rate of 1e-05, with a total batch size of 16 (2 per device across 4 GPUs with 2 gradient accumulation steps). The optimizer used was Adam with betas=(0.9, 0.999) and epsilon=1e-08, employing a cosine learning rate scheduler with a 0.1 warmup ratio over 3 epochs.
Intended Use Cases
This model is primarily intended for applications and research directly related to the characteristics and patterns present in the EEG123/DE_subject_1 dataset. Its fine-tuned nature suggests suitability for tasks requiring specific domain knowledge derived from this dataset.