Heejindo/rationale_model_e10
Hugging Face
TEXT GENERATIONConcurrency Cost:1Model Size:1BQuant:BF16Ctx Length:32kLicense:llama3.2Architecture:Transformer Warm

Heejindo/rationale_model_e10 is a 1 billion parameter language model, fine-tuned from meta-llama/Llama-3.2-1B, featuring a 32,768 token context length. This model is a fine-tuned version of the Llama-3.2 architecture, with its specific training dataset and primary differentiators currently undefined. It was trained with a learning rate of 1e-05 over 3 epochs, achieving a final validation loss of 1.9041 at its lowest point. The model's specific intended uses and limitations are not detailed in its current documentation.

Loading preview...

Overview

Heejindo/rationale_model_e10 is a 1 billion parameter language model, fine-tuned from the meta-llama/Llama-3.2-1B base model. It supports a context length of 32,768 tokens. The model's specific training dataset, intended uses, and limitations are not detailed in its current documentation, indicating that its primary differentiators and optimal applications are yet to be fully defined.

Training Details

The model underwent 3 epochs of training using the AdamW optimizer with a learning rate of 1e-05. Key training hyperparameters included a train_batch_size and eval_batch_size of 4. The training process resulted in a validation loss of 1.9041 at its lowest point (after 1500 steps), though the validation loss generally increased in later stages of training. The model was developed using Transformers 4.46.3, Pytorch 2.3.0, Datasets 2.14.4, and Tokenizers 0.20.3.

Current Status

As per the available documentation, more information is needed regarding the model's specific capabilities, intended uses, and the dataset it was fine-tuned on. Users should be aware that its performance characteristics beyond the reported loss are not specified.