Name: Wojtekb30/Qwen2.5-1.5B-Instruct-RVQ-Human-Motion-CoT-PoC API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: Wojtekb30

Model Overview

This model, Wojtekb30/Qwen2.5-1.5B-Instruct-RVQ-Human-Motion-CoT-PoC, is a specialized variant of the Qwen/Qwen2.5-1.5B-Instruct large language model. It is a proof-of-concept designed for embodied AI, enabling an LLM to generate both natural language reasoning and corresponding 3D human motion sequences from a single prompt.

Key Differentiators

Integrated Motion Generation: Unlike standard LLMs, this model is trained to emit discrete movement tokens directly within its chat output, interleaved with reasoning text.
Explicit Motion Token Vocabulary: It utilizes a custom vocabulary including <move>, </move>, and <m_{level}_{value}> tokens to represent quantized motion data.
Efficient Motion Encoding: Only 3 movement tokens are needed to decode approximately 0.5 seconds of coarse motion, and 10 tokens for detailed motion, allowing for responsive control even with slower LLM inference.
Chain-of-Thought for Motion: The model produces a first-person chain of thought about the movement alongside the motion tokens.

How it Works

The model takes a natural-language action prompt and, with a specific system prompt (You are an embodied AI...), generates a response containing both descriptive text and motion tokens. These tokens are then decoded by an included RVQ (Residual Vector Quantization) decoder into a 3D human motion sequence. The process involves extracting tokens, rebuilding an RVQ token matrix, summing quantizer embeddings, and decoding the latent sequence.

Limitations

As a proof-of-concept, the model's motion quality is dependent on the RVQ decoder and token correctness. It generally handles basic movements but may struggle with more complex actions. It is intended for research and prototyping.

Overview

Model Overview

Key Differentiators

How it Works

Limitations

Full Model Card (README)