arcee-ai/WitchLM-1.5B

Hugging Face
TEXT GENERATIONConcurrency Cost:1Model Size:1.5BQuant:BF16Ctx Length:32kPublished:Sep 2, 2024License:apache-2.0Architecture:Transformer0.0K Open Weights Warm

WitchLM-1.5B by arcee-ai is a 1.5 billion parameter language model with a substantial 131,072 token context length. This model is built with Axolotl and demonstrates a focus on general language understanding, as indicated by its benchmark performance across various tasks. It is suitable for applications requiring a compact model with a very large context window.

Loading preview...

WitchLM-1.5B: A Compact Model with Extensive Context

WitchLM-1.5B, developed by arcee-ai, is a 1.5 billion parameter language model notable for its exceptionally large context window of 131,072 tokens. Built using the Axolotl framework, this model is designed for general language understanding tasks.

Key Capabilities

  • Broad Context Handling: Processes and understands information across a very long context of up to 131,072 tokens, enabling deep contextual comprehension.
  • General Language Understanding: Benchmarks indicate performance across various tasks, including instruction following (inst_level_strict_acc: 0.3357), general reasoning (acc_norm on BBH: 0.3591), and MMLU (acc: 0.2441).
  • Efficient Training: Trained with specific hyperparameters including a learning rate of 5e-05, a total batch size of 64, and 5 epochs, utilizing a cosine learning rate scheduler.

Good For

  • Applications requiring a model with a very large input capacity to process extensive documents or conversations.
  • Use cases where a smaller parameter count is preferred for efficiency, without sacrificing significant context length.
  • General text generation and understanding tasks where the ability to maintain long-term coherence is crucial.