zai-org/LongAlign-13B-64k-base
TEXT GENERATIONConcurrency Cost:1Model Size:13BQuant:FP8Ctx Length:4kPublished:Jan 29, 2024License:apache-2.0Architecture:Transformer0.0K Open Weights Cold

The LongAlign-13B-64k-base model, developed by THUDM, is a 13 billion parameter Llama-2-13B base model specifically extended to support an impressive 64k token context window. This model is designed for long-context understanding and processing, leveraging the LongAlign-10k dataset and specialized training strategies like packing with loss weighting and sorted batching. It excels at tasks requiring comprehension and generation over very long inputs, making it suitable for applications demanding extensive contextual awareness.

Loading preview...