The adpretko/train-riscv-O2_epoch1and2 model is a 1.5 billion parameter language model, fine-tuned from saves/train-riscv-O2_epoch1and2/checkpoint-2800. With a context length of 131072 tokens, it was trained using a learning rate of 2e-05 and a total batch size of 512 over 2 epochs. This model is a specialized iteration, though its primary use case and specific differentiators are not detailed in the available information.
Loading preview...
Overview
The adpretko/train-riscv-O2_epoch1and2 model is a 1.5 billion parameter language model, fine-tuned from an existing checkpoint, saves/train-riscv-O2_epoch1and2/checkpoint-2800. It was trained for 2 epochs with a substantial context length of 131072 tokens, indicating potential for processing very long sequences.
Training Details
The model underwent training with the following key hyperparameters:
- Learning Rate: 2e-05
- Batch Size: 8 (train and eval), with a total effective training batch size of 512 due to gradient accumulation steps.
- Optimizer: ADAMW_TORCH_FUSED
- Scheduler: Cosine learning rate scheduler with a 0.1 warmup ratio.
- Epochs: 2.0
Key Capabilities
- Large Context Window: Supports a context length of 131072 tokens, enabling it to process and generate very long texts.
- Fine-tuned Base: Built upon an existing checkpoint, suggesting a specialized application or domain.
Good for
- Use cases requiring processing of extremely long input sequences.
- Further experimentation or fine-tuning for specific tasks where its base model and training parameters are relevant.