Name: warshanks/talkie-1930-13b-it-mlx-bf16 API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: warshanks

Overview

This model, warshanks/talkie-1930-13b-it-mlx-bf16, is an MLX port of the lewtun/talkie-1930-13b-it-hf 13 billion parameter instruction-tuned transformer, optimized for Apple Silicon. Its primary distinguishing feature is its ability to generate text in the style of pre-1930s English prose, achieved through a custom architecture.

Key Architectural Features

Talkie employs several unique architectural elements:

Custom RoPE Convention: Uses an inverse-rotation formula for positional encoding.
Weightless RMSNorm: Applied at various points without learned scale parameters.
Per-head Q Gain: Learnable scalar applied to queries after RoPE and Q-norm.
Per-layer Scalar Gains: attn_gain, mlp_gain, and embed_skip scale residual contributions.
Scaled lm_head Weights: Incorporates a lm_head_gain for the final output layer.

MLX Conversion and Quantization

The model was converted to MLX, with native Talkie support integrated into mlx-lm. Several quantized variants are available, including 8-bit, 6-bit, and 4-bit versions. Notably, the -mlx-4bit-DWQ variant utilizes DWQ-calibrated 4-bit quantization, which significantly improves long-form generation quality compared to bare 4-bit quantization, which can lead to repetition. Numerical agreement with the upstream transformers model is within typical bf16 disagreement ranges.

Use Cases

This model is particularly well-suited for applications requiring text generation with a distinct historical linguistic style, specifically pre-1930s English prose. It can be used for:

Generating historical narratives or dialogues.
Creating content for themed applications or games.
Exploring unique architectural designs in language models.

Overview

Overview

Key Architectural Features

MLX Conversion and Quantization

Use Cases

Full Model Card (README)