Model Overview
nandansarkar/qwen3_0-6B_adversarial_6 is a fine-tuned language model based on a prior adversarial version of Qwen3.0-6B. This 0.8 billion parameter model has been specifically trained on the adversarial_dataset_6.
Key Characteristics
- Base Model: Fine-tuned from
/home/nsarkar/orcd/pool/GPTeacher/model_checkpoints/qwen3_0-6B_adversarial_5. - Training Data: Utilizes
adversarial_dataset_6 for its fine-tuning process. - Parameter Count: Features 0.8 billion parameters.
- Context Length: Supports a significant context window of 40960 tokens, enabling processing of long inputs.
Training Details
The model was trained with a learning rate of 1e-05, a train_batch_size of 2, and gradient_accumulation_steps of 8, resulting in a total_train_batch_size of 32. It used the AdamW optimizer with cosine learning rate scheduling over 1 epoch. The training was conducted on a multi-GPU setup with 2 devices.
Potential Use Cases
Given its training on an adversarial dataset, this model is likely intended for research and applications related to:
- Adversarial Robustness: Evaluating or improving the resilience of language models against malicious inputs.
- Adversarial Content Generation: Creating text designed to test or challenge other AI systems.
- Security Research: Exploring vulnerabilities and defenses in natural language processing systems.