arcee-ai/GLM-4-32B-Base-32K
TEXT GENERATIONConcurrency Cost:2Model Size:32BQuant:FP8Ctx Length:32kPublished:Jun 23, 2025License:mitArchitecture:Transformer0.0K Open Weights Cold

GLM-4-32B-Base-32K is a 32 billion parameter language model developed by arcee-ai, based on THUDM's GLM-4-32B-Base-0414 architecture. This model is specifically engineered for robust performance over an extended 32,000-token context window, significantly improving recall compared to its base model which degraded after 8,192 tokens. It achieves this through targeted long-context training, iterative merging, and short-context distillation, making it ideal for tasks requiring deep understanding and processing of long documents.

Loading preview...

GLM-4-32B-Base-32K Overview

GLM-4-32B-Base-32K is a 32 billion parameter language model developed by arcee-ai, building upon THUDM's GLM-4-32B-Base-0414. Its primary differentiator is its significantly enhanced long-context capability, maintaining strong performance up to a 32,000-token context window, whereas the original model's capabilities degraded beyond 8,192 tokens.

Key Capabilities & Improvements

  • Extended Context Window: Reliably processes information across a 32,000-token context, a substantial improvement over the base model's effective 8,000 tokens.
  • Improved Recall: Demonstrates significantly better performance on Needle in a Haystack (NIAH) benchmarks at longer context lengths, with averages of 98.3% at 16,384 tokens and 76.5% at 32,768 tokens, compared to the base model's 66.1% and 0.4% respectively.
  • Enhanced General Benchmarks: Achieves approximate 5% overall improvement on standard base model benchmarks, including arc_challenge (64.93%), mmlu (77.87%), and winogrande (80.03%).
  • Development Methodology: Achieved through targeted long-context continued pretraining, iterative merging of model checkpoints, and short-context distillation to retain initial capabilities.

Use Cases

This model is designed as a robust base for continued training, particularly for applications that require deep understanding and processing of extensive textual data. Its strong long-context performance makes it suitable for tasks such as:

  • Summarization of long documents
  • Question answering over large text corpora
  • Context-aware content generation
  • Any application demanding reliable information retrieval and processing across extended inputs.