tksoon/llama32_3bn_raft_v1

TEXT GENERATIONConcurrency Cost:1Model Size:3.2BQuant:BF16Ctx Length:32kTool Calling:SupportedPublished:Mar 12, 2026Architecture:Transformer Cold

The tksoon/llama32_3bn_raft_v1 is a 3.2 billion parameter Llama-based instruction-tuned language model, fine-tuned and converted to GGUF format using Unsloth. This model is designed for efficient deployment and usage with tools like llama-cli and Ollama, offering quantized versions for various performance needs. It is optimized for general instruction-following tasks, providing a compact yet capable solution for local inference.

Loading preview...

Overview

The tksoon/llama32_3bn_raft_v1 is a 3.2 billion parameter Llama-based instruction-tuned language model. It has been fine-tuned and converted into the GGUF format, leveraging Unsloth for accelerated training and conversion. This model is specifically prepared for efficient local deployment and inference.

Key Capabilities

  • Instruction Following: Designed to respond effectively to user instructions, making it suitable for various conversational and task-oriented applications.
  • GGUF Format: Provided in GGUF format, ensuring compatibility with popular inference engines like llama.cpp and its derivatives.
  • Quantized Versions: Available in multiple quantization levels (Q5_K_M, Q8_0, Q4_K_M) to balance performance and resource consumption.
  • Ollama Support: Includes an Ollama Modelfile for streamlined deployment within the Ollama ecosystem.
  • Efficient Training: Benefited from 2x faster training using the Unsloth library, indicating an optimized development process.

Good For

  • Local Inference: Ideal for running instruction-tuned LLM tasks on consumer-grade hardware due to its compact size and GGUF format.
  • Rapid Prototyping: Suitable for developers looking to quickly integrate a capable instruction-following model into their applications.
  • Resource-Constrained Environments: The availability of quantized versions makes it a strong candidate for deployment in environments with limited computational resources.
  • Experimentation with Unsloth: Showcases the efficiency benefits of using Unsloth for fine-tuning and model conversion.