EEG123/subject1-test1

TEXT GENERATIONConcurrency Cost:1Model Size:1BQuant:BF16Ctx Length:32kLicense:llama3.2Architecture:Transformer Cold

EEG123/subject1-test1 is a 1 billion parameter language model fine-tuned from meta-llama/Llama-3.2-3B. This model is specifically adapted using the EEG123/DE_subject_1 dataset, achieving a loss of 1.5612 on its evaluation set. It is designed for tasks related to its specialized training data, offering focused performance within that domain. The model maintains a context length of 32768 tokens.

Loading preview...

Model Overview

EEG123/subject1-test1 is a 1 billion parameter language model, fine-tuned from the meta-llama/Llama-3.2-3B architecture. This model has undergone specialized training on the EEG123/DE_subject_1 dataset, resulting in a reported evaluation loss of 1.5612.

Key Characteristics

  • Base Model: Fine-tuned from meta-llama/Llama-3.2-3B.
  • Parameter Count: 1 billion parameters.
  • Context Length: Supports a context window of 32768 tokens.
  • Training Data: Specialized on the EEG123/DE_subject_1 dataset.
  • Performance: Achieved a validation loss of 1.5612 after 3 epochs of training.

Training Details

The model was trained using a learning rate of 1e-05, with a total batch size of 16 (2 per device across 4 GPUs with 2 gradient accumulation steps). The optimizer used was Adam with betas=(0.9, 0.999) and epsilon=1e-08, employing a cosine learning rate scheduler with a 0.1 warmup ratio over 3 epochs.

Intended Use Cases

This model is primarily intended for applications and research directly related to the characteristics and patterns present in the EEG123/DE_subject_1 dataset. Its fine-tuned nature suggests suitability for tasks requiring specific domain knowledge derived from this dataset.