Model Overview
This model, adpretko/riscv_to_armv8mac_qwen25coder_3p0b_full, is a specialized 3.1 billion parameter language model. It is a fine-tuned variant of the Qwen/Qwen2.5-Coder-3B-Instruct base model, specifically adapted for code translation tasks. The model has been trained on a series of riscv_to_armv8mac datasets, indicating its primary focus on converting code between RISC-V and ARMv8-A architectures.
Key Characteristics
- Base Model: Qwen/Qwen2.5-Coder-3B-Instruct, known for its code generation capabilities.
- Parameter Count: 3.1 billion parameters, offering a balance between performance and computational efficiency.
- Context Length: Supports a substantial context window of 32768 tokens, enabling the processing of larger code segments.
- Specialization: Fine-tuned on specific datasets (
riscv_to_armv8mac_000 through riscv_to_armv8mac_006) to excel in cross-architecture code translation.
Training Details
The model underwent training with the following key hyperparameters:
- Learning Rate: 2e-05
- Optimizer: AdamW with betas=(0.9, 0.999) and epsilon=1e-08
- Batch Size: A total effective batch size of 8 (1 per device with 8 gradient accumulation steps).
- Epochs: Trained for 0.5 epochs, suggesting a focused fine-tuning approach on the specialized datasets.
Intended Use Cases
This model is particularly suited for developers and researchers working on:
- RISC-V to ARMv8-A Code Translation: Its primary strength lies in converting code snippets or functions between these two distinct instruction set architectures.
- Cross-Architecture Development: Assisting in porting or understanding code across different hardware platforms.
- Code Analysis: Potentially useful for analyzing architectural differences in code by observing its translation outputs.