ewald1976/Silver-Siren-ST-12B

TEXT GENERATIONConcurrency Cost:1Model Size:12BQuant:FP8Ctx Length:32kTool Calling:SupportedPublished:Jun 13, 2026License:apache-2.0Architecture:Transformer0.0K Open Weights Cold

ewald1976/Silver-Siren-ST-12B is a 12 billion parameter language model based on the Mistral-NeMo-12B architecture, developed by ewald1976. This model is a technical experiment in "Style-Tuning," where only the lm_head tensor is retrained to alter the model's prose style while preserving its core reasoning and knowledge. It specializes in recalibrating vocabulary, sentence structure, and prose quality, making it suitable for applications requiring a specific literary voice.

Loading preview...

Silver-Siren-ST-12B: A Style-Tuned Language Model

ewald1976/Silver-Siren-ST-12B is a 12 billion parameter model built upon the Mistral-NeMo-12B architecture, focusing on a novel "StyleTune" methodology. This approach surgically modifies the model's output style without altering its underlying reasoning or world knowledge.

Key Capabilities & Methodology

  • Surgical Style-Tuning: Unlike traditional fine-tuning, this model freezes all attention and MLP layers (Layers 0–39), preserving the base model's logic and instruction-following capabilities.
  • Targeted Recalibration: Only the lm_head (output projection) tensor is trained, allowing for a complete recalibration of the model's vocabulary, sentence structure, and prose quality.
  • Voice Transformation: The process changes the model's voice and stylistic output, rather than its intelligence or factual accuracy.
  • Base Model Contrast: The base model, Vortex5/Silver-Siren-12B, was chosen for its distinct, emotion-forward style, serving as a benchmark to demonstrate the efficacy of overwriting deeply ingrained stylistic biases.
  • Literary Transformation: Exposed to a curated classical sci-fi literary dataset (inspired by Asimov, Huxley, and Lem), the model's prose delivery was dramatically transformed.

Training Details

  • Epochs: 3
  • Learning Rate: 4e-4 (Linear Scheduler)
  • Target Modules: lm_head only

Recommended Sampler Settings

  • Temperature: 0.7 - 0.9
  • Min_P: 0.05
  • Top_P: 0.95
  • Repetition Penalty: 1.05

Use Cases

This model is ideal for applications where a specific literary style or prose quality is desired, without compromising the base model's core reasoning abilities. It can be used to adapt a model's output to match a particular author's style, genre, or tone.