Name: nm-testing/convert_ct_dequant-e2e API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: nm-testing

Model Overview

This model, nm-testing/convert_ct_dequant-e2e, is a 1.1 billion parameter auto-regressive language model. It is a quantized version of Alibaba's Qwen3-8B, developed by NVIDIA using the TensorRT Model Optimizer. The model's weights and activations are quantized to FP4 data type, making it ready for efficient inference with TensorRT-LLM.

Key Capabilities

Quantized Performance: Utilizes FP4 quantization for optimized inference on NVIDIA GPU-accelerated systems, including Blackwell microarchitecture.
Transformer Architecture: Based on the Qwen3-8B's optimized transformer architecture.
Text-to-Text Generation: Processes text input (strings) to generate text output (strings).
Commercial Use: Licensed under Apache 2.0, suitable for both commercial and non-commercial applications.

Use Cases

This model is particularly well-suited for developers seeking pre-quantized models for deployment in:

AI Agent systems
Chatbots
RAG (Retrieval Augmented Generation) systems
Other AI-powered applications requiring efficient, GPU-accelerated inference.

Overview

Model Overview

Key Capabilities

Use Cases

Full Model Card (README)