adpretko/train-riscv-O2_epoch1and2
Hugging Face
TEXT GENERATIONConcurrency Cost:1Model Size:1.5BQuant:BF16Ctx Length:32kPublished:Oct 31, 2025Architecture:Transformer Warm

The adpretko/train-riscv-O2_epoch1and2 model is a 1.5 billion parameter language model, fine-tuned from saves/train-riscv-O2_epoch1and2/checkpoint-2800. With a context length of 131072 tokens, it was trained using a learning rate of 2e-05 and a total batch size of 512 over 2 epochs. This model is a specialized iteration, though its primary use case and specific differentiators are not detailed in the available information.

Loading preview...

Overview

The adpretko/train-riscv-O2_epoch1and2 model is a 1.5 billion parameter language model, fine-tuned from an existing checkpoint, saves/train-riscv-O2_epoch1and2/checkpoint-2800. It was trained for 2 epochs with a substantial context length of 131072 tokens, indicating potential for processing very long sequences.

Training Details

The model underwent training with the following key hyperparameters:

  • Learning Rate: 2e-05
  • Batch Size: 8 (train and eval), with a total effective training batch size of 512 due to gradient accumulation steps.
  • Optimizer: ADAMW_TORCH_FUSED
  • Scheduler: Cosine learning rate scheduler with a 0.1 warmup ratio.
  • Epochs: 2.0

Key Capabilities

  • Large Context Window: Supports a context length of 131072 tokens, enabling it to process and generate very long texts.
  • Fine-tuned Base: Built upon an existing checkpoint, suggesting a specialized application or domain.

Good for

  • Use cases requiring processing of extremely long input sequences.
  • Further experimentation or fine-tuning for specific tasks where its base model and training parameters are relevant.