artificialguybr/QWEN-2.5-0.5B-Synthia-II

TEXT GENERATIONConcurrency Cost:1Model Size:0.5BQuant:BF16Ctx Length:32kPublished:Oct 28, 2024License:apache-2.0Architecture:Transformer0.0K Open Weights Cold

artificialguybr/QWEN-2.5-0.5B-Synthia-II is a 0.5 billion parameter Qwen2.5-based causal language model, fine-tuned by artificialguybr on the Synthia-v1.5-II dataset. This model features a 32,768 token context length and is specifically optimized for conversational AI, instruction following, and coherent text generation. It leverages advanced architectural features like GQA, RoPE, SwiGLU, and RMSNorm for enhanced performance in dialogue systems.

Loading preview...

Overview

artificialguybr/QWEN-2.5-0.5B-Synthia-II is a fine-tuned version of the Qwen2.5-0.5B base model, developed by artificialguybr. It has 490 million parameters (360M non-embedding) and supports a 32,768 token context length. The model incorporates advanced features such as RoPE positional embeddings, SwiGLU activations, and RMSNorm. It was fine-tuned on the Synthia-v1.5-II dataset to enhance its instruction-following and conversational abilities through careful hyperparameter tuning.

Key Capabilities

  • Conversational AI applications: Designed for natural and coherent dialogue.
  • Instruction following: Excels at understanding and executing given instructions.
  • Text generation: Produces coherent and contextually relevant text.
  • Multi-turn dialogue systems: Capable of maintaining context across multiple exchanges.

Training Details

The model was trained for 3 epochs using a learning rate of 1e-05, a batch size of 40, and AdamW optimizer. It utilized BF16 mixed precision and a sequence length of 4096 with sample packing enabled. The training data was split with 95% for training and 5% for validation, using an "[INST] {instruction} [/INST]" format.

Limitations

As a 0.5B parameter model, it may not match larger models in complex reasoning tasks. Performance in non-English languages may also be limited, and users should be aware of potential biases inherited from the training data.