CausalLM/34b-beta
TEXT GENERATIONConcurrency Cost:2Model Size:34BQuant:FP8Ctx Length:32kPublished:Feb 6, 2024License:gpl-3.0Architecture:Transformer0.1K Open Weights Cold
CausalLM/34b-beta is a 34 billion parameter causal language model developed by CausalLM, featuring a 32768 token context length. This model is instruction-tuned using the ChatML format and achieves an MT-Bench score of 8.5. It is noted for its low MMLU contamination score of 0.38, indicating a focus on original data, and is recommended for general language generation tasks.
Loading preview...
CausalLM/34b-beta: An Instruction-Tuned 34B Parameter Model
CausalLM/34b-beta is a 34 billion parameter causal language model designed for general language generation and understanding tasks. It utilizes the ChatML prompt format for instruction-following, making it suitable for conversational AI and task-oriented applications.
Key Capabilities & Features
- Strong Instruction Following: Achieves an MT-Bench score of 8.5, indicating robust performance in multi-turn conversations and complex instructions.
- Low Data Contamination: Demonstrates a low MMLU contamination score of 0.38, suggesting a training dataset with minimal overlap with common benchmarks, which can lead to more generalized knowledge.
- Optimized for Transformers Inference: While the model has known precision issues that require specific inference methods, it performs best with the Transformers library or llama.cpp with q8_0 quantization for optimal output quality.
- 32K Context Window: Supports a substantial context length of 32,768 tokens, enabling the processing of longer inputs and generating more coherent, extended responses.
Important Considerations
- Inference Frameworks: Users are advised to avoid "accelerated inference frameworks" like VLLM temporarily due to precision issues that can degrade output quality. Transformers or llama.cpp (q8_0) are recommended.
- Repetition Penalty: The model is designed to perform optimally without a repetition penalty.
- Training Data: While contamination is minimized, the README notes that some level of training data contamination is unavoidable due to cost constraints.
Good for
- Applications requiring strong instruction-following capabilities.
- General text generation, summarization, and question-answering.
- Developers seeking a 34B model with a focus on original data and robust conversational performance.