AlexCuadron/Qwen-32B-8a4e8f3a
AlexCuadron/Qwen-32B-8a4e8f3a is a 32.8 billion parameter language model, fine-tuned from Qwen/Qwen2.5-32B. This model was specifically trained on the fc_rlm dataset, indicating a specialization for tasks related to that particular data distribution. Its fine-tuning process suggests an optimization for performance within the domain represented by the fc_rlm dataset.
Loading preview...
Model Overview
This model, AlexCuadron/Qwen-32B-8a4e8f3a, is a fine-tuned variant of the Qwen2.5-32B architecture, developed by AlexCuadron. It comprises 32.8 billion parameters and has a context length of 32768 tokens.
Key Characteristics
- Base Model: Built upon the robust Qwen2.5-32B foundation.
- Specialized Fine-tuning: The model has undergone specific fine-tuning on the
fc_rlmdataset. This targeted training suggests an enhanced capability or performance for tasks aligned with the characteristics and content of this particular dataset.
Training Details
The fine-tuning process utilized the following key hyperparameters:
- Learning Rate:
1e-05 - Batch Size: A
train_batch_sizeof2andeval_batch_sizeof8were used, with atotal_train_batch_sizeandtotal_eval_batch_sizeof64due togradient_accumulation_stepsof4across8GPUs. - Optimizer:
ADAMW_TORCHwith standard betas and epsilon. - Scheduler: Cosine learning rate scheduler with a
0.1warmup ratio. - Epochs: Trained for
4.0epochs.
Intended Use
While specific intended uses and limitations require more information, the fine-tuning on the fc_rlm dataset implies its suitability for applications where data similar to fc_rlm is prevalent or where the model's performance on this specific distribution is critical.