RESMPDEV/Gemma-Wukong-2b

Hugging Face
TEXT GENERATIONConcurrency Cost:1Model Size:2.5BQuant:BF16Ctx Length:8kPublished:Feb 23, 2024License:gemma-terms-of-useArchitecture:Transformer0.0K Warm

Gemma-Wukong-2b is a 2.5 billion parameter, 8192-token context length, instruction-tuned decoder-only language model developed by RESMPDEV, based on Google's Gemma 2B architecture. This model is a dealigned chat finetune, trained on the OpenHermes-2.5 dataset and selections from Cognitive Computations, making it suitable for general text generation and conversational AI tasks. It offers a lightweight solution for deployment in resource-limited environments.

Loading preview...

Gemma-Wukong-2b: A Dealigned Gemma Finetune

Gemma-Wukong-2b is a 2.5 billion parameter, instruction-tuned language model derived from Google's Gemma 2B. Developed by RESMPDEV, this model is specifically finetuned for chat applications, utilizing the teknium OpenHermes-2.5 dataset and additional data from Cognitive Computations. It features an 8192-token context length, making it capable of handling moderately long conversations and text inputs.

Key Capabilities

  • Dealigned Chat Finetune: Optimized for conversational interactions, providing responses that may deviate from strict alignment guidelines.
  • Efficient Deployment: Its 2.5B parameter size allows for deployment in environments with limited resources, such as laptops, desktops, or private cloud infrastructure.
  • Text Generation: Capable of various text generation tasks, including question answering, summarization, and general reasoning, inherited from the base Gemma architecture.

Performance Highlights

While specific benchmarks for Gemma-Wukong-2b are provided on the Open LLM Leaderboard, the base Gemma 2B model demonstrates capabilities across a range of benchmarks, including MMLU (42.3), HellaSwag (71.4), and HumanEval (22.0). The finetuning aims to enhance its performance in chat-oriented scenarios.

Intended Usage

This model is well-suited for content creation, powering chatbots and conversational AI, and text summarization. Its design prioritizes accessibility and innovation, enabling developers to integrate advanced AI capabilities into diverse applications.