nandansarkar/qwen3_0-6B_adversarial_2

Hugging Face
TEXT GENERATIONConcurrency Cost:1Model Size:0.8BQuant:BF16Ctx Length:32kLicense:otherArchitecture:Transformer Warm

The nandansarkar/qwen3_0-6B_adversarial_2 model is a 0.8 billion parameter language model, fine-tuned from a previous adversarial version. It is based on the Qwen3.0 architecture and has a context length of 40960 tokens. This model is specifically trained on an adversarial dataset, suggesting its potential for robustness testing or generating challenging inputs.

Loading preview...

Model Overview

The nandansarkar/qwen3_0-6B_adversarial_2 is a 0.8 billion parameter language model, fine-tuned from an earlier adversarial checkpoint. It leverages the Qwen3.0 architecture and supports a substantial context length of 40960 tokens. The model's training involved a specific adversarial_dataset_2, indicating its development for tasks related to adversarial scenarios.

Training Details

The model was trained using the following key hyperparameters:

  • Learning Rate: 1e-05
  • Batch Size: A train_batch_size of 2 with gradient_accumulation_steps of 8 resulted in a total_train_batch_size of 32.
  • Optimizer: AdamW with betas=(0.9, 0.95) and epsilon=1e-08.
  • Scheduler: Cosine learning rate scheduler with a 0.05 warmup ratio.
  • Epochs: Trained for 1 epoch.

Potential Use Cases

Given its fine-tuning on an adversarial dataset, this model is likely intended for:

  • Adversarial Robustness Testing: Evaluating the resilience of other language models against adversarial attacks.
  • Adversarial Example Generation: Creating challenging inputs to probe model weaknesses or biases.
  • Security Research: Investigating vulnerabilities in AI systems.