tksoon/llama33_70bn_raft_v2

TEXT GENERATIONConcurrency Cost:4Model Size:70BQuant:FP8Ctx Length:8kPublished:Apr 9, 2026Architecture:Transformer Cold

The tksoon/llama33_70bn_raft_v2 is a 70 billion parameter instruction-tuned language model, fine-tuned and converted to GGUF format using Unsloth. This model is designed for efficient deployment and inference, with available quantized versions for various hardware configurations. It is optimized for general-purpose language tasks, leveraging the Llama 3.3 architecture.

Loading preview...

tksoon/llama33_70bn_raft_v2 Overview

This model, tksoon/llama33_70bn_raft_v2, is a 70 billion parameter instruction-tuned language model. It has been fine-tuned and converted into the GGUF format, specifically utilizing the Unsloth framework, which is noted for its accelerated training capabilities (2x faster).

Key Features & Capabilities

  • Architecture: Based on the Llama 3.3 model family.
  • Parameter Count: 70 billion parameters, offering substantial language understanding and generation capabilities.
  • Format: Provided in GGUF format, making it compatible with various inference engines like llama.cpp and Ollama.
  • Quantization Options: Multiple quantized versions are available, including Q5_K_M, Q8_0, and Q4_K_M, alongside BF16 files, allowing users to select the optimal balance between performance and resource usage.
  • Efficient Training: Fine-tuned with Unsloth, indicating an optimized and potentially more robust training process.
  • Ollama Support: Includes an Ollama Modelfile for straightforward deployment and integration into Ollama ecosystems.

Intended Use Cases

This model is suitable for a broad range of general-purpose language tasks, particularly where efficient local inference is desired due to its GGUF format. Its instruction-tuned nature makes it effective for following commands and generating coherent responses. The availability of various quantizations allows for deployment on diverse hardware, from consumer-grade GPUs to more powerful setups.