xiaolesu/Lean4-sft-tk-8b

TEXT GENERATIONConcurrency Cost:1Model Size:8BQuant:FP8Ctx Length:32kPublished:Mar 16, 2026License:apache-2.0Architecture:Transformer Open Weights Cold

The xiaolesu/Lean4-sft-tk-8b model is an 8 billion parameter language model fine-tuned from Qwen/Qwen3-8B. This model is specifically trained on the xiaolesu/lean4-sft-stmt-tk dataset, indicating an optimization for tasks related to Lean 4 theorem proving or formal verification. It leverages a 32768 token context length and was trained using Axolotl with specific Liger optimizations.

Loading preview...

Model Overview

The xiaolesu/Lean4-sft-tk-8b is an 8 billion parameter language model, fine-tuned from the Qwen/Qwen3-8B base model. It was developed using the Axolotl framework, incorporating several Liger optimizations such as liger_rope, liger_rms_norm, liger_glu_activation, liger_layer_norm, and liger_fused_linear_cross_entropy.

Key Training Details

  • Base Model: Qwen/Qwen3-8B
  • Dataset: Fine-tuned on the xiaolesu/lean4-sft-stmt-tk dataset, suggesting a specialization in Lean 4 related tasks.
  • Context Length: Configured for a sequence length of 8192 tokens, with flex_attention enabled.
  • Hyperparameters: Trained with a learning rate of 1e-05, using the AdamW_Torch_Fused optimizer, and a cosine learning rate scheduler with 53 warmup steps.
  • Frameworks: Utilizes Transformers 5.3.0, Pytorch 2.9.1+cu128, Datasets 4.5.0, and Tokenizers 0.22.2.

Potential Use Cases

Given its specific training dataset, this model is likely optimized for:

  • Assisting with Lean 4 theorem proving.
  • Generating or understanding Lean 4 code and formal statements.
  • Applications requiring specialized knowledge in formal verification or mathematical logic within the Lean 4 ecosystem.