WitchLM-1.5B: A Compact Model with Extensive Context
WitchLM-1.5B, developed by arcee-ai, is a 1.5 billion parameter language model notable for its exceptionally large context window of 131,072 tokens. Built using the Axolotl framework, this model is designed for general language understanding tasks.
Key Capabilities
- Broad Context Handling: Processes and understands information across a very long context of up to 131,072 tokens, enabling deep contextual comprehension.
- General Language Understanding: Benchmarks indicate performance across various tasks, including instruction following (inst_level_strict_acc: 0.3357), general reasoning (acc_norm on BBH: 0.3591), and MMLU (acc: 0.2441).
- Efficient Training: Trained with specific hyperparameters including a learning rate of 5e-05, a total batch size of 64, and 5 epochs, utilizing a cosine learning rate scheduler.
Good For
- Applications requiring a model with a very large input capacity to process extensive documents or conversations.
- Use cases where a smaller parameter count is preferred for efficiency, without sacrificing significant context length.
- General text generation and understanding tasks where the ability to maintain long-term coherence is crucial.