Burumbum/lfm2.5-350m-dolly-q4-onnx

TEXT GENERATIONConcurrency Cost:1Model Size:0.35BQuant:BF16Ctx Length:32kPublished:May 26, 2026License:apache-2.0Architecture:Transformer Open Weights Cold

Burumbum's lfm2.5-350m-dolly-q4-onnx is a 350 million parameter LFM 2.5 model, fine-tuned on the Databricks Dolly 15K instruction dataset. This model is optimized for instruction-following tasks, demonstrating improved capabilities over its base model. It leverages QLoRA for efficient training and is suitable for applications requiring a compact, instruction-tuned language model with a 32768 token context length.

Loading preview...

Model Overview

Burumbum/lfm2.5-350m-dolly-q4-onnx is a 350 million parameter language model based on Liquid AI's LFM 2.5 architecture. It has been fine-tuned for one full epoch on the Databricks Dolly 15K instruction dataset, which comprises over 15,000 instruction examples across eight categories. This fine-tuning process significantly enhances the model's instruction-following capabilities compared to its base version.

Training Details

The model was trained using QLoRA (4-bit base with merged LoRA adapters) via Unsloth and TRL SFTTrainer. The LoRA configuration included a rank of 16, alpha of 32, and a dropout of 0.05, targeting LFM2-specific modules such as q_proj, k_proj, v_proj, out_proj, in_proj, w1, w2, and w3. Training was conducted on a Kaggle T4 GPU for approximately 59 minutes, achieving a final training loss of 2.4921. The training utilized a batch size of 8 with a gradient accumulation of 4 (effective batch size 32), a learning rate of 5e-5 with a cosine schedule, and AdamW 8-bit optimization in fp16 precision.

Use Cases

This model is particularly well-suited for applications requiring a small, efficient language model capable of following instructions. Its compact size and instruction-tuned nature make it a good candidate for deployment in environments with limited computational resources, where direct instruction-based interactions are desired.