Name: axolotl-ai-co/romulus-mistral-nemo-12b-simpo API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: axolotl-ai-co

Overview

This model, romulus-mistral-nemo-12b-simpo, is a 12 billion parameter language model developed by axolotl-ai-co. It is a fine-tuned version of winglian/m12b-20240721-test010, specifically optimized using the SIMPO (Symmetric Inverse-Propensity Off-Policy) reinforcement learning algorithm. The training involved a learning rate of 5e-07 over 466 steps, with a total batch size of 128, and utilized a cosine learning rate scheduler.

Key Characteristics

Base Model: Fine-tuned from winglian/m12b-20240721-test010.
Alignment Method: Employs SIMPO (Symmetric Inverse-Propensity Off-Policy) for reinforcement learning, with specific parameters rl_beta: 2.5, cpo_alpha: 0.05, and simpo_gamma: 0.1.
Context Length: Supports a sequence length of 8192 tokens, with padding to this length.
Training Data: Fine-tuned on princeton-nlp/gemma2-ultrafeedback-armorm dataset, configured for chatml template.

Training Details

Optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08.
Learning Rate: 5e-07, with a cosine scheduler and 25 warmup steps.
Hardware: Trained across 8 GPUs with a gradient accumulation of 16 steps, resulting in a total effective batch size of 128.

Intended Use

This model is suitable for general language generation and conversational AI applications, benefiting from its SIMPO-based alignment for improved response quality and its substantial context window.

Overview

Overview

Key Characteristics

Training Details

Intended Use

Full Model Card (README)