jsyeom/llama-2-13b-hf-smooth
TEXT GENERATIONConcurrency Cost:1Model Size:13BQuant:FP8Ctx Length:4kPublished:Mar 24, 2026Architecture:Transformer Cold

The jsyeom/llama-2-13b-hf-smooth model is a 13 billion parameter Llama 2-based causal language model that has undergone SmoothQuant smoothing without any quantization. Developed by jsyeom, this model retains the full precision of the original Llama 2-13b-hf while applying a smoothing technique to its activations. This process is designed to prepare the model for potential future quantization, making it particularly suitable for research and development in efficient inference. It maintains a 4096 token context length, offering a smoothed foundation for various natural language processing tasks.

Loading preview...