Weyaxi/Neural-una-cybertron-7b

TEXT GENERATIONConcurrency Cost:1Model Size:7BQuant:FP8Ctx Length:4kPublished:Dec 9, 2023License:apache-2.0Architecture:Transformer0.0K Open Weights Cold

Neural-una-cybertron-7b is a 7 billion parameter causal language model developed by Weyaxi, further fine-tuned from fblgit/una-cybertron-7b-v2-bf16. It utilizes Direct Preference Optimization (DPO) on the Intel/orca_dpo_pairs dataset, making it suitable for instruction-following and conversational tasks. The model has a context length of 4096 tokens and is optimized for general-purpose language generation.

Loading preview...

Neural-una-cybertron-7b: DPO Fine-tuned Language Model

Neural-una-cybertron-7b is a 7 billion parameter language model developed by Weyaxi. It is built upon the fblgit/una-cybertron-7b-v2-bf16 base model and has undergone further fine-tuning using Direct Preference Optimization (DPO). The DPO process leveraged the Intel/orca_dpo_pairs dataset, enhancing its ability to follow instructions and generate coherent, preferred responses.

Key Characteristics

  • Base Model: Fine-tuned from fblgit/una-cybertron-7b-v2-bf16.
  • Fine-tuning Method: Direct Preference Optimization (DPO).
  • Training Dataset: Utilized the Intel/orca_dpo_pairs dataset for DPO.
  • Architecture: Causal Language Model with 7 billion parameters.
  • Context Length: Supports a context window of 4096 tokens.
  • Training Environment: Fine-tuned on an Nvidia A100-SXM4-40GB GPU.
  • Prompt Format: Employs the ChatML prompt template for structured conversations.

Training Details

The model's fine-tuning involved LoRA (Low-Rank Adaptation) with specific hyperparameters (r=16, lora_alpha=16, lora_dropout=0.05). Training arguments included a per_device_train_batch_size of 4, gradient_accumulation_steps of 4, and a learning_rate of 5e-5 over 200 max_steps. The DPO Trainer used a beta of 0.1 and max_prompt_length of 1024.

Use Cases

This model is well-suited for applications requiring instruction-following, general-purpose text generation, and conversational AI, benefiting from its DPO-enhanced alignment.