Tralalabs/CHEETAH-350M-Merged-FP16

TEXT GENERATIONConcurrency Cost:1Model Size:0.35BQuant:BF16Ctx Length:32kPublished:May 30, 2026License:otherArchitecture:Transformer Cold

Tralalabs/CHEETAH-350M-Merged-FP16 is a 0.35 billion parameter instruction-tuned model, merged from a LoRA adapter fine-tuned on the LiquidAI/LFM2-350M base model. Optimized for lightweight instruction following, this model is designed for fast, cheap inference and small assistant experiments. It excels in scenarios requiring a compact, efficient language model with a 32768 token context length.

Loading preview...

CHEETAH-350M-Merged-FP16 Overview

CHEETAH-350M-Merged-FP16 is a compact, instruction-tuned language model with 0.35 billion parameters, built upon the LiquidAI/LFM2-350M base. It was fine-tuned using a LoRA adapter on the HuggingFaceTB/smol-smoltalk dataset, which is a subset of SmolTalk optimized for smaller models. The LoRA adapter was then merged into the base model to create a standalone Transformers model, offering a balance of performance and efficiency.

Key Capabilities

  • Lightweight Instruction Following: Designed to respond effectively to instructions despite its small size.
  • Efficient Inference: Optimized for fast and cost-effective inference, suitable for resource-constrained environments.
  • Educational & Experimental Use: Ideal for fine-tuning experiments and developing models within the CHEETAH family.
  • High Context Length: Supports a context window of 32768 tokens, allowing for processing longer inputs.

Good For

  • Small assistant experiments requiring quick responses.
  • Fast local or cloud inference where computational resources are a concern.
  • Educational purposes and fine-tuning research.
  • Development within the CHEETAH model family.

Limitations

As a small 350M-class model, CHEETAH-350M-Merged-FP16 may exhibit limitations such as hallucinating facts, struggling with complex reasoning, providing weak answers on niche topics, and misinterpreting intricate instructions. Users should verify outputs with trusted sources for factual accuracy.