artificialguybr/Qwen2.5-0.5B-OpenHermes2.5

TEXT GENERATIONConcurrency Cost:1Model Size:0.5BQuant:BF16Ctx Length:32kPublished:Oct 15, 2024License:apache-2.0Architecture:Transformer0.0K Open Weights Cold

artificialguybr/Qwen2.5-0.5B-OpenHermes2.5 is a 0.49 billion parameter causal language model, fine-tuned by artificialguybr on the OpenHermes 2.5 dataset. Based on the Qwen2.5 architecture, it features enhanced instruction following, improved long text generation up to 8K tokens, and better structured output generation, making it suitable for natural language processing research and application tasks. The model supports a 32,768 token context length and multilingual capabilities across 29 languages.

Loading preview...

Model Overview

This model, artificialguybr/Qwen2.5-0.5B-OpenHermes2.5, is a fine-tuned version of the Qwen2.5-0.5B base model, developed by artificialguybr. It leverages the Qwen2.5 architecture, which introduces significant advancements over previous Qwen iterations, and has been specifically trained on the high-quality OpenHermes 2.5 dataset.

Key Capabilities & Features

  • Enhanced Instruction Following: Improved ability to understand and execute instructions.
  • Long Text Generation: Capable of generating long texts, with support for up to 8K tokens in output and a 32,768 token context length.
  • Structured Output: Better at generating structured data, particularly JSON.
  • Multilingual Support: Supports over 29 languages.
  • Robustness: Increased resilience to diverse system prompts.
  • Base Architecture: Utilizes a Transformer architecture with RoPE, SwiGLU, RMSNorm, and Attention QKV bias.

Training Details

The model was fine-tuned using the Axolotl framework on the OpenHermes 2.5 dataset. This dataset comprises 1 million primarily synthetically generated instruction and chat samples, known for contributing to state-of-the-art LLM development. Training involved a learning rate of 1e-05, a batch size of 5, and 3 epochs, with gradient checkpointing and BF16 mixed precision enabled.

Intended Uses

This model is designed for research and application in natural language processing tasks, including text generation and language understanding. It can serve as a foundation for conversational AI after further fine-tuning (e.g., SFT or RLHF). Users should be aware of potential biases from the training data and that direct conversational use without additional fine-tuning is not recommended.