AlexCuadron/Qwen-32B-8a4e8f3a

TEXT GENERATIONConcurrency Cost:2Model Size:32.8BQuant:FP8Ctx Length:32kPublished:Mar 5, 2025License:otherArchitecture:Transformer Cold

AlexCuadron/Qwen-32B-8a4e8f3a is a 32.8 billion parameter language model, fine-tuned from Qwen/Qwen2.5-32B. This model was specifically trained on the fc_rlm dataset, indicating a specialization for tasks related to that particular data distribution. Its fine-tuning process suggests an optimization for performance within the domain represented by the fc_rlm dataset.

Loading preview...

Model Overview

This model, AlexCuadron/Qwen-32B-8a4e8f3a, is a fine-tuned variant of the Qwen2.5-32B architecture, developed by AlexCuadron. It comprises 32.8 billion parameters and has a context length of 32768 tokens.

Key Characteristics

  • Base Model: Built upon the robust Qwen2.5-32B foundation.
  • Specialized Fine-tuning: The model has undergone specific fine-tuning on the fc_rlm dataset. This targeted training suggests an enhanced capability or performance for tasks aligned with the characteristics and content of this particular dataset.

Training Details

The fine-tuning process utilized the following key hyperparameters:

  • Learning Rate: 1e-05
  • Batch Size: A train_batch_size of 2 and eval_batch_size of 8 were used, with a total_train_batch_size and total_eval_batch_size of 64 due to gradient_accumulation_steps of 4 across 8 GPUs.
  • Optimizer: ADAMW_TORCH with standard betas and epsilon.
  • Scheduler: Cosine learning rate scheduler with a 0.1 warmup ratio.
  • Epochs: Trained for 4.0 epochs.

Intended Use

While specific intended uses and limitations require more information, the fine-tuning on the fc_rlm dataset implies its suitability for applications where data similar to fc_rlm is prevalent or where the model's performance on this specific distribution is critical.