Weyaxi/Neural-una-cybertron-7b
Neural-una-cybertron-7b is a 7 billion parameter causal language model developed by Weyaxi, further fine-tuned from fblgit/una-cybertron-7b-v2-bf16. It utilizes Direct Preference Optimization (DPO) on the Intel/orca_dpo_pairs dataset, making it suitable for instruction-following and conversational tasks. The model has a context length of 4096 tokens and is optimized for general-purpose language generation.
Loading preview...
Neural-una-cybertron-7b: DPO Fine-tuned Language Model
Neural-una-cybertron-7b is a 7 billion parameter language model developed by Weyaxi. It is built upon the fblgit/una-cybertron-7b-v2-bf16 base model and has undergone further fine-tuning using Direct Preference Optimization (DPO). The DPO process leveraged the Intel/orca_dpo_pairs dataset, enhancing its ability to follow instructions and generate coherent, preferred responses.
Key Characteristics
- Base Model: Fine-tuned from
fblgit/una-cybertron-7b-v2-bf16. - Fine-tuning Method: Direct Preference Optimization (DPO).
- Training Dataset: Utilized the
Intel/orca_dpo_pairsdataset for DPO. - Architecture: Causal Language Model with 7 billion parameters.
- Context Length: Supports a context window of 4096 tokens.
- Training Environment: Fine-tuned on an
Nvidia A100-SXM4-40GBGPU. - Prompt Format: Employs the ChatML prompt template for structured conversations.
Training Details
The model's fine-tuning involved LoRA (Low-Rank Adaptation) with specific hyperparameters (r=16, lora_alpha=16, lora_dropout=0.05). Training arguments included a per_device_train_batch_size of 4, gradient_accumulation_steps of 4, and a learning_rate of 5e-5 over 200 max_steps. The DPO Trainer used a beta of 0.1 and max_prompt_length of 1024.
Use Cases
This model is well-suited for applications requiring instruction-following, general-purpose text generation, and conversational AI, benefiting from its DPO-enhanced alignment.