warshanks/talkie-1930-13b-it-mlx-bf16

TEXT GENERATIONConcurrency Cost:1Model Size:13BQuant:FP8Ctx Length:32kPublished:Apr 30, 2026License:apache-2.0Architecture:Transformer Open Weights Cold

The warshanks/talkie-1930-13b-it-mlx-bf16 is a 13 billion parameter instruction-tuned decoder-only transformer, ported for Apple Silicon using MLX. It features a custom architecture with unique RoPE conventions, weightless RMSNorm, and per-head/per-layer scalar gains. This model is specifically designed to generate text styled as pre-1930s English prose, making it suitable for historical narrative generation and themed content creation.

Loading preview...

Overview

This model, warshanks/talkie-1930-13b-it-mlx-bf16, is an MLX port of the lewtun/talkie-1930-13b-it-hf 13 billion parameter instruction-tuned transformer, optimized for Apple Silicon. Its primary distinguishing feature is its ability to generate text in the style of pre-1930s English prose, achieved through a custom architecture.

Key Architectural Features

Talkie employs several unique architectural elements:

  • Custom RoPE Convention: Uses an inverse-rotation formula for positional encoding.
  • Weightless RMSNorm: Applied at various points without learned scale parameters.
  • Per-head Q Gain: Learnable scalar applied to queries after RoPE and Q-norm.
  • Per-layer Scalar Gains: attn_gain, mlp_gain, and embed_skip scale residual contributions.
  • Scaled lm_head Weights: Incorporates a lm_head_gain for the final output layer.

MLX Conversion and Quantization

The model was converted to MLX, with native Talkie support integrated into mlx-lm. Several quantized variants are available, including 8-bit, 6-bit, and 4-bit versions. Notably, the -mlx-4bit-DWQ variant utilizes DWQ-calibrated 4-bit quantization, which significantly improves long-form generation quality compared to bare 4-bit quantization, which can lead to repetition. Numerical agreement with the upstream transformers model is within typical bf16 disagreement ranges.

Use Cases

This model is particularly well-suited for applications requiring text generation with a distinct historical linguistic style, specifically pre-1930s English prose. It can be used for:

  • Generating historical narratives or dialogues.
  • Creating content for themed applications or games.
  • Exploring unique architectural designs in language models.