adpretko/train-riscv-O2_epoch3_AMD

TEXT GENERATIONConcurrency Cost:1Model Size:1.5BQuant:BF16Ctx Length:32kTool Calling:SupportedPublished:Nov 3, 2025Architecture:Transformer Cold

The adpretko/train-riscv-O2_epoch3_AMD is a 1.5 billion parameter language model fine-tuned by adpretko, building upon adpretko/train-riscv-O2_epoch1and2. This model specializes in processing RISC-V related code, having been trained extensively on 50 parts of the AnghaBench-risc-o2-full dataset. It is optimized for tasks involving RISC-V architecture, offering a 32768 token context length for comprehensive code analysis.

Loading preview...

Overview

This model, adpretko/train-riscv-O2_epoch3_AMD, is a 1.5 billion parameter language model developed by adpretko. It represents a further fine-tuned iteration of the adpretko/train-riscv-O2_epoch1and2 model, specifically enhanced for RISC-V related tasks. The model was trained with a substantial context length of 32768 tokens, making it suitable for processing extensive code segments.

Key Capabilities

  • RISC-V Code Specialization: Fine-tuned across 50 parts of the AnghaBench-risc-o2-full dataset, indicating a strong focus on RISC-V assembly and related code.
  • Extended Context Window: Features a 32768 token context length, allowing for the analysis and generation of longer code sequences or detailed technical documentation.

Training Details

The model underwent training with a learning rate of 2e-05, a train_batch_size of 8, and a gradient_accumulation_steps of 8, resulting in an effective total_train_batch_size of 512. It utilized the AdamW_Torch_Fused optimizer and a cosine learning rate scheduler with a 0.1 warmup ratio over 2 epochs. The training environment included Transformers 4.55.0, Pytorch 2.8.0+rocm6.3, Datasets 3.6.0, and Tokenizers 0.21.1.